Atte th, haw 


THE ANNALS 


| of 
- MATHEMATICAL 


STATISTICS 


FOUNDED AND EDITED BY H.C. CARVER, 1930-1938 
EDITED BY 8. 8. WILKES, 1938-1049 


THe OFFICIAL JOURNAL OF THE INSTITUTE 
OF MATHEMATICAL STATISTICS 


Contents 


PAGE 
The Problem of the Greater Mean. Racuu Rasy BanapurR AND HERBERT 


Distributions Related to Comparison of Two Means and Two Regression 
Cloamaeeenite;. OU eras CHAWI S655. Lede sick hee CAMEO halons 507 


- The Extremal Quotient. E.J.Gumspe. anv R. D. Keenrr 


On eer Test for Pooling Mean Squares in the Analysis of Variance. 


Estimating the Mean and Variance of Normal en from Singly Trun- 
cated and Doubly Truncated Samples. A.C. ConsEn, Jr 


The Asymptotic Properties of Estimates of the Parameters of a Single Equa- 
tion in a Complete System of Stochastic Equations. T. W. ANDERSON 
AND HERMAN RUBIN 


Some Nonparametric Tests of Whether the Largest Observations of a Set are 
Too Large or Too Small. Joun E. Wasa 


On a Measure of Dependence between Two Random Variables. Nits 
BLomQvVIst 


Some Two Sample Tests. Doveias G. CoapmMan 
Notes: 


Transformations Related to the Angular and the Square Root. Murray 
F. FREEMAN AND JOHN W. TUKEY 


Remark on the Article ‘On a Class of Distributions that Approach the 
Normal Distribution Function’? by George B. Dantzig. T. N. E. 
CORVELBID, 5 6s bo oe ce nebadedednns+ tebens siden ds Gphisapeets kadeetune 611 


Independence of Quadratic Forms in Normally Correlated Variables. 
UKIYOsI KAWaDA 


Errata to “Control Chart for Largest and Smallest Values.’’ 


Abstracts of Papers 
News and Notices 
Report of the Berkeley Meeting of the Institute 


Vol. 21, No. 4 — December, 1950 





_. THE ANNALS 
OF MATHEMATICAL STATISTICS 


Eprror 
T. W. ANDERSON 


Associate EpiTors ‘s 
R. C. BOSE : M. A. GIRSHICK ALEXANDER.M, MOOD 
W. FELLER E. L. LEHMANN JOHN W. TUKEY 4 
WITH THE COOPERATION OF 

M. 8S. BartLetr Paut 8. DwrER J. NEYMAN 

Davip BLACKWELL CHURCHILL EISENHART H. E. Rossins 

GreorGce W. Brown . E. Harris S. N. Roy 

Haratp CramMér Paut G. Hore. Henry Scuerré 

Witu1am G. CocHRAaNn Haroip Hore.uine Water A. SHEWHART 

J. F. Daty Howarp LEVENE A. Wap 

W. Epwarps DremIne Witu1aM G. Mapvow JacosB Wo.LFow!Tz 

J. L. Doos H. B. Mann Max A. WoopBury 

FREDERICK MostTELLER 


Published quarterly by the Institute of Mathematical Statistics in March, June, 7 
September and December at Baltimore, Maryland. of 
ee ee 
INSTITUTE OF MATHEMATICAL STATISTICS 
General Business Administration Building, University of Michigan, 
Office : Ann Arbor, Michigan 
C.H Fischer, Secretary-Treasurer 4 
This address should be used for all communications concerning ~ 
membership, subscriptions, changes of address, back numbers, © 
etc., but not for editorial correspondence. Changes in mailing © 
address which are to become effective for a given issue should be 7 
reported to the Secretary on or before the 15th of the month J 
preceding the month of that issue. 4 


Department of Mathematical Statistics, Columbia University, 
New York 27, New York 
T. W. Anderson, Editor 4 
Manuscripts should be submitted to this address; each manu- © 
script should be typewritten, double-spaced with wide margins, — 
and the original copy should be submitted (preferably with one % 
additional copy). Footnotes should be reduced to a minimum § 
and whenever possible replaced by a bibliography at the end of ~ 
the paper; formulae in footnotes should be avoided. Figures, charts, | 
and diagrams should be professionally drawn on plain white © 
paper or tracing cloth in black India ink twice the size they ~ 
are to be printed. Authors are requested to keep in mind typo- ¥ 
graphical difficulties of complicated mathematical formulae. a 
Authors will ordinarily receive only galley proofs. Fifty reprints 4 
without covers will be furnished free. Additional reprints and ~ 
covers furnished at cost. 7 
Subscription $10.00 per year inside the Western Hemisphere; $5.00 elsewhere. | 
Price: Single issues $3.00. Back numbers are available at $10.00 per vol- | 
ume or $3.00 per single issue. 4 


CoMPOSED AND PRINTED AT THE 
WAVERLY PRESS, Inc., BAttrmorE, Maryuanp, U. S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the act of March 3, 1879. 
Copyright, 1950, by the Institute of Mathematical Statistics. 








ore wa 
° 


~~ 











THE PROBLEM OF THE GREATER MEAN 


By RaGcuu Ray BAHADUR AND HERBERT RoBBins! 
University of North Carolina 


1. Introduction and summary. Let 7, 72 be normal populations with means 
m,, ™. respectively and a common variance o, the parameter point 
w = (m,, mc) which characterizes the two populations being unknown, and 
let 2 be an arbitrary given set of possible points w. Random samples of fixed 
sizes 7, M2 are drawn from 7, m2 respectively, giving the combined sample 
point v = (21, Xie, °+* , Lin, 5 Vor, Lor, *** » Leng). For reasons which will be 
made clear later in connection with practical examples, any function f(v) such 
that 0 < f(v) < 1 is called a decision function, and for any such f(v) the risk 
function is defined to be 


(1) r(f|w) = max [m , m2] — MES | wo] — m1 — f | w] > 0, 


where E denotes the expectation operator. A decision function f(v) is said to be 
(a) uniformly better than f(v) if r(f|w) < r(f|w) for all w in Q, the strict in- 
equality holding for at least one w, (b) admissible if no decision function is 
uniformly better than f(v), and (c) minimaz if 


sup [r(f | w)] = inf sup [r(f | w)]. 

wed f we 
The “‘problem of the greater mean”’ is, for any given Q, to determine the mini- 
max decision functions, particularly those which are also admissible. Special 
interest attaches to the case in which there exists a unique minimax decision 
function f(v) (in the sense that if f(v) is any minimax decision function then 
f(v) = f(v) for almost every v in the sample space); such an f(v) is automatically 
admissible. 

The problem of the greater mean is, of course, a special problem in Wald’s 
general theory of statistical decision functions [1]. Our results will, however, be 
derived by very simple direct methods which make no use of Wald’s general 
theorems. 

We cite without proofs a few examples in order to show how strongly the 
solution of the problem of the greater mean depends on the structure of Q. In 
each case the minimax decision function is a function only of the two sample 
means %,, <2. 

(i) Let 2’ consist of the two points (a, b: ¢) and (b, a: ¢), with a < b. Then 

‘1 if 1%, — nek > (nm; — ne)(a + b)/2, 
(2) Fv) = 


0 otherwise, 


is the unique minimax decision function. 


1 This work was supported in part by the Office of Naval Research. 
469 








470 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 


(ii) Let 2” consist of the two points (ec + h, ec: ¢) and (¢ — h,e: 6), with h > 0. 
Then 
; “ (lif z > c, 
(3) fe(v) = 4 
| ° 
\0 otherwise, 
is the unique minimax decision function. 
(iii) Let Q’” consist of the three points (3, —4:1), (3, 3:1), ( —%,—4:1), and 
let 2; = no = n. Then 


hae 40°" <4, 

(4) f**(v) = - 
0 otherwise, 

where \ is a certain definite constant, is the unique minimax decision function. 

The parameter spaces of two or three points specified in these examples are 
rather trivial, but in fact the corresponding decision functions (2), (8), (4) re- 
main the unique minimax solutions of the decision problem with respect to 
much more general parameter spaces. Thus, for example, it is clear that /*(v) 
will remain the unique minimax decision function with respect to any 2 which 
contains 2’ and is such that 

sup [r(f*|)] = sup [r(f* | o) 
weQ weQ/ 
Corresponding remarks apply to f2(v) and f**(v). 
When n; = nm, (2) reduces to 
(lif Zz > #, 
(5) fr) = 5 
0 otherwise. 
This decision function is of particular interest when both the means m, , mz are 
unknown. It will be shown that whether or not n; = nz, f'(v) is the unique 
minimax decision function under certain conditions on 2 which are likely to 
hold in practice, at least when both n,; and nz are sufficiently large (Theorem 3). 
Likewise, f2(v), which is the analogue of f°(v) when one of the means (mz) 
known exactly, is apt to be the unique minimax decision function in such cases, 
at least when 7, is sufficiently large (Theorem 4). These results on f°(v) and 
f°(v) form the main results of the present paper. 

So much by way of a general summary. We shall now give a practical il- 
lustration (another is given in Section 3) to show how the problem of the greater 
mean arises in applications. 

Suppose that a consumer requires a certain number of manufactured articles 
which can be supplied at the same cost by each of two sources ™ and 7. The 
quality of an article is measured by a numerical characteristic x, and it is known 
that in the product of 7; , x is normally distributed with mean m; and variance 

*, but the values of these parameters are unknown. The consumer has ob- 
tained a random sample of »; and nz articles from 7, and z»2 respectively, and 
has found the values of x to be (a1, %2,°** , Tiny 3 Mar, Te, **, ena) = = 
What is the best way of ordering a total of N articles from the two sources 








PROBLEM OF GREATER MEAN 471 


The usual statistical theory, which confines itself to estimating the unknown 
parameters and to testing hypotheses of the form H)(m; = me), has at best an 
indirect bearing on the problem at hand. We therefore adopt Wald’s point of 
view and investigate the consequences of any given course of action. If the 
consumer orders fN articles from 7; and (1 — f)N from a2, where 0 < f < 1, 
then the expectation of the sum of the z-values in the articles he obtains will 
be N(mf + m(1 — f)). The maximum possible value of this quantity is N 
max [m,, m.], and the ‘‘loss” per article which he sustains may therefore be 
taken as 


W(w, f) = max [m,, m2] — mf — m(1 — f) > 0, 


where w = (m,, m2: a) is the true parameter point. 

The consumer wants to choose f so as to make W as small as possible. If 
he knew m to be greater, or to be less, than m,, then by choosing f = 1 or 0 
respectively he could make JV = 0. But since he does not know which m, is 
the greater he will presumably choose f as some function of the sample point v. 
Suppose, therefore, that a “decision function” f(v), such that 0 < f(v) < 1 but 
not necessarily taking on only the values 0 and 1, is defined for all points v in 
the sample space and that the consumer sets f = f(v).” In repeated applica- 
tions of this procedure, the “risk” or expected loss (a double expectation is in- 
volved: the expected loss for a given f and the expected value of f in using the 
decision function f(v)) per article is given by (1), and the consumer will try to 
find an f(v) which minimizes this risk. Since the value of the risk depends on w 
it is necessary to specify which values of w are to be regarded as possible in 
the given problem; let the set of all such w be denoted by ©. If the consumer 
agrees to adopt the “conservative” criterion of minimizing the maximum pos- 
sible risk, then the statistician’s problem is to find the minimax decision func- 
tions in the sense defined above. We have given the solutions of this problem 
for certain types of parameter spaces. The reader will observe that each of the 
minimax decision functions (2), (3), (4) was of the ‘“‘all or nothing” type, with 
values 0 and 1 only. (Whether this remains true for every 2 we do not know.) 
By using one of these decision functions in a given instance one arrives at either 
the best possible decision or the worst. The attitudes of doubt sometimes as- 
sociated with the non-rejection of the hypothesis Ho(m; = me) are therefore 


2 One might say that the consumer should choose f in the light of what he can infer from 
v about the m; . But this formulation as a problem in ordinary statistical inference (estima- 
tion and testing) is not relevant and may be misleading. For example, a plausible f(v), 
based on the idea that the problem is one of testing hypotheses, is as follows: ‘‘Perform the 
two-tailed t test of Ho(m, = m2) at the five per cent level. If Ho is rejected set f = 0 or 1 
according as 7; is less than or greater than #2. If Ho is not rejected set f = }.’’ Another 
f(v), based on the theory of estimation, according to which the 7; are the ‘‘best’’ estimates 
of the m; , is as follows: ‘Set f = 0 or 1 according as 7; is less than or greater than Z2 .”’ 
Actually, the latter procedure is, from the remarks above concerning (5), the ‘‘best’’ in 
a certain definite sense and under certain conditions, but this fact does not follow from the 
usual theory of estimation. 





472 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 


irrelevant to the problem of the greater mean in the examples cited. (Cf. foot- 
note 2; also Example 1 in Section 3.) 

The risk function (1) is but one of a general class FP of risk functions, to be 
defined in Section 2, which are associated with the problem of the greater mean. 
The most important members of 2? are (1) and 


(6) F(f|w) = P(incorrect decision using f(v) | w), 


where ““m, < m,” and ‘“‘m, > m,”’ are the two possible decisions. The risk fune- 
tion (6) is relevant to applications of a purely “scientific”? nature in which the 
statistician is asked merely to give his opinion as to which population has 
the greater mean. Although the problem of constructing a suitable decision 
function for (6) is akin in spirit to the problems considered in the now classical 
Neyman-Pearson theory of statistical tests, no satisfactory solutions seem to 
be available. It is easy to see, however, that (1) and (6) are quite similar. Of 
course, in the case of (1) a decision function f(v) may take on any value be- 
tween 0 and 1 inclusive, while for (6) we allow only functions which take on 
only the values 0 and 1, corresponding respectively to the decisions ‘‘m, < m,.” 
and “‘m, > m.”’. We then have for any such f(v), 


([PU(v) = lle) = Hifi] if my 


< M2, 

(6’) F(f jw) =  P(i(v) = O]w) = El —fiai if m2, > me, 
\9 if m, = me 

, 

and by comparison with (1) we see that r(f|w) = | m: — mz | F(f | w) for all w. 


Now, in the three examples (i), (ii), (iii) cited above the unique minimax decision 
functions happen to take on only the values 0 and 1, and | m; — me | is constant 
on each of the respective parameter sets. It follows that (2), (3), (4) are also 
the unique minimax decision functions relative to (6) and to 0’, 2”, Q’”” respec- 
tively. The remarks above following Example (iii) also remain valid for the 
risk function (6). 

We conclude this section with a remark on the methods of this paper. Any 
decision function relevant to (6) is equivalent to a test of the hypothesis Ho(m < 
m2) against the alternative H,(m, > me), the region {v:f(v) = 1} being the 
“critical region.”” Hence the Neyman-Pearson probability ratio method can be 
used to obtain the unique minimax decision function with respect to (6) and 
an 2 consisting of two (or more) points, and the result carries over to more 
general types of 2 in the manner already indicated. It turns out, however, that 
the dominant properties of the probability ratio tests are not confined to 
the class of tests alone, but extend to the class of all functions f(v) such that 
0 < f(v) < 1. This result (Theorem 1) enables us to solve the problem of the 
greater mean for the risk function (1) as well as for (6). The reader who is inter- 
ested in applications may turn to Section 3. 


2. Theorems. We require the following slight generalization of a well-known 
result of Neyman and Pearson [2]. 








PROBLEM OF GREATER MEAN 473 


THEOREM 1. Let $(v), oi(v), de(v), --- , o-(v) be summable functions defined on 
a measure space E with points v and measure u, u(E) < ©, let o,-++, c, be 
arbitrary constants, and let A & E be such that 


r - 

|v e A implies o(v) > dX Cid, (v), 
(7) :, 

iv e E — A implies o(v) < do c,(v). 

\ l 
Set 

id = dd; = l, ee yA 

(8) [ o:du =a (i r), 


and let f(v) be any measurable function such that 
(9) 0O<f) <1 


and such that 


(10) [ i ty wt r) 
Then 

(11) [ 1% du < [ody 

PRooF. [ fodu = [ fodu+ [ _,# du 


< [ fo du + [ \f (= c8s) du by (9), (7), 


[ fod + > cof feed 
[ fo au D cx| ff foe ds ~- [, foc | 
- [ $6 du +> De | a: — f, fou | by (10), 


= [ 1 du + : C; if. (1 — fod | by (8), 


I 


-[ou-[a-nom+ [ a-A(z cas) " 
| out [ (1 — f) (= a 6) he 


< [ oud by (9), (7). 








474 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 


Note 1. If the condition 


(12) 2 {0:60 = . cose} = 0 


1 
holds, then in order that the equality hold in (11) it is necessary and sufficient that 
(13) fv) = xa(v) a.e. (u), 


where xa(v) ts the characteristic function of the set A, 
(1 ifveA, 
xa(v) = 4 : 
\o fue BE — A, 


Proor. The sufficiency is obvious. To prove the necessity we observe from the 
proof of Theorem 1 that for equality to hold in (11) it is necessary that 


AO) (6 (vy) - . cods(t)) = 0 a.e. (u) in H—A, 


and that 


(l— Fo)( oe) _ » eile) ) = 0 a.e. (u) in A, 
These relations and (12) imply (13). 
Norte 2. If relations (10) are replaced by 


(10’) i Sica = 4: (= 1, -+-,9), 
E 


and tf each of the constants c; is non-negative, then Theorem 1 and Note 1 remain 
valid. 

Theorem 1 has applications to a number of decision problems of a certain 
type. In the present paper we consider only the ‘“‘problem of the greater mean” 
for two normal populations with a common variance o’, where at least one of 
the means m,, mz is unknown. The following assumptions and definitions will 
be valid henceforth. 

(A) Ey is the N = m + “m dimensional sample space of points 
v = (a, %e,°°* , Lin, 3 Tr, To, °*** , Lon.) A measurable function f(v) de- 
fined for all v in Ey is a decision function if 0 < f(v) < 1. fi(v) = fo(v) means 
fi(v) = fo(v) for almost every v in Ey . 

(B) Q is a given set of points w = (m,, m2: c), ¢ > 0. Given w in Q, the prob- 
ability measure in Ey is that generated by the distribution function 


K(v | w) = III 


G [(xiz — m;)/cl, 
1 


i=l j= 


where 


G(z) = (2x)? [ ee ay, 





we 


PROBLEM OF GREATER MEAN 475 


Given any function @ = ¢(v) for which the integral exists we write 
Ele | ol = i $(v) dK(v | w). 


EN 


(C) Let yw) = (91, g2) be a function defined for all w in Q, with values in 
E,, and such that 


(14) m; <m; implies g; < g; (¢,7 = 1, 2). 
Given p, 0 < p < 1, we define 
We, p) = max [gq , go] — gip — go(1 — p), 
and given a decision function f(v) we define the risk function 
; r(f|o) = E[W(e, f)|«] = We, Eff | «)) 
©) = max [9 , go] — mElf |] — gel — f| a). 
The class of risk functions (15) corresponding to all functions y(w) which satisfy 
(14) is denoted by R. (The two most important members of R are (1), with 
y(w) = (m,, me), 
and (6), with 
((0,1) if m <m, 


yw) =4(1,0) if m>m, 
((0,0) if m= m. 


The risk functions (1) and (6) appear in the examples in Section 3.) Throughout 
this section r(f | w) will denote a fixed but arbitrary member of R. We shall use 
the notations 


hw) = |g — gel, 
—} 
d(w) = (2 > 4) (m; — me)/e, 


Mm Ny 
Zz, = ni > x3; (i = 1, 2). 
j=l 

THEOREM 2. Let w, = (m,, m2: 0) and w. = (u1, ue : o) be two parameter points 

such that 
d(w,) < 0, d(w.) > 0, h(w)h(w.) > 0. 

For any \, —-© <A X< &, let fi(v) be the characteristic function of the set 
(16) Ay = {vini(ur — m)%, + neo(ue — me)% > Ao}. 


Then 
(i) Corresponding to any decision function f(v), there exists a X such that 


r(fxjor) = r(f | ow), r(fx | we) < r(f | we); 





476 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 


the inequality is strict unless f(v) = fy(v). 
(ii) Given any d, tf f(v) is a decision function such that 


r(f | wi) < r(fr | @;) 
then 
f(ve) = far). 

(iii) There exists a unique c such that 
(17) r(fe | wi) = r(fe | we) = B say, 
and for any decision function f(v) we have 
(18) B < max [r(f | w:), rY | #2); 
the inequality is strict unless f(v) = f-(v). It follows that f-(v) 1s the unique minimax 
decision function corresponding to the two-point parameter space 2 = (wy; , we). 

Proor.’ (a) Let $(v), ¢:(v) be the joint frequency functions of the sample 
point v corresponding to the parameter points w: , w; respectively. It is readily 


seen that for any there exists a unique constant (A), 0 < (A) < &, such 
that 


Ay = {v:d(v) > adi(v)} 
(c(— ©) = 0,a,(«) = ©). Moreover, since w ¥ w, 
u{vido(v) = adi(v)} = 0. 


It follows from Theorem 1, Note 2, that if f(v) is any decision function such 
that 


Elf | oi] < Elf, | ol, 
then 
E\f | 2] < Elfy | wel, 


and the strict inequality holds unless f(v) = file). 
(b) It is clear from the definition (16) that for any fixed parameter point w 
the function 


Elf, | wo] = P(A, | &) 


is continuous and strictly decreasing from 1 to 0 as \ varies from — x to +, 
(c) For any decision function f(v) and any parameter point w we have by (C), 


r(f}w) = max |g, go] — mElf |e] — gel — f | ol. 
Hence 


(r(f | or) = A(w)ELS | ot], h(wy) > 0, 


Ir(f | we) = h(w)El1 — f | wl, h(w.) > 0. 


* Theorem 2 (as also Example (iii) of Section 1) could be derived from Wald’s general 
results on the completeness of the class of Bayes solutions of statistical decision problems. 





PROBLEM OF GREATER MEAN 477 


Since for any decision function f(v), 0 < E[f | w:] < 1, we can by (b) choose \ 
so that 


(20) Effi |] = Elf | or], 
and by (a) it follows that unless f(v) = fi(v), 
(21) Elf, | w2] > ELf | 2]. 


(i). Follows from (19), (20) and (21). 
(ii). Follows from (19) and (a). 
(iii). (17) follows from (19) and (b). Then (18) follows from (17) and (ii). 


Theorem 2 provides the solution of any problem of the greater mean when Q 
consists of Just two points w; , w, . For, the problem is trivial unless d(w;) d(w.) < 
0 and h(w:)h(w2) > 0, and in the non-trivial case the unique minimax decision 
function is f-(v) defined by (17). Moreover, it follows at once from the defini- 
tion that if f(v) is the unique minimax decision function with respect to some 
parameter set ©, then it remains so with respect to any 2 such that 2 > © and 


sup [r(f|w)] = sup [r(f| #)]. 
weQ w¢«0 


By taking sets 2 which consist of two points, Theorem 2 can therefore be used 
to obtain sufficient conditions for an f(v) = f.(v) to be the unique minimax 
decision function with respect to a quite general 2. (It is clear that results 
analogous to Theorem 2(iii) but pertaining to more than two parameter points 
can be derived from Theorem 1, and that these results can be exploited in a 
similar way. An instance of this procedure where © consists of three points will 
be given at the end of this section.) 

The theorems which follow exploit Theorem 2 in this way to obtain conditions 
on 2 under which the decision functions f°(v) and f¢(v) defined by (5) and (3) 
are minimax. We consider f*(v) first. From (C) we have, after a simple compu- 
tation, 


(22) r(f° |) = h(w)-G(— | d@) |). 


THEOREM 3. Suppose that there exist sequences {wx}, {wx} of points w, = 
(mu. > Mo : ox), wy = (wrk » Mak : ox) an Q such that 


(i) lim r(f°| wx) = sup [r(f"|«)] (#0, @), 
k++ woeQ 

(ii) (we) = — d(wp), h(x) = h(ws), and nymy + nymx = Nur, + Nowe for 
every k = 1,2,--- 


Then f'(v) is an admissible minimax decision function. If there exist 
w) = (m, M2: 0), w = (M1, M2: 0) in Q satisfying (i) and (ii), then f°(v) is the 
unique minimax decision function. 

Proor. By (22) and (ii), 


(23) r(f° | wy) = r(f° | wr) for every k. 











478 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 






Without loss of generality, we may assume the two sequences to be so chosen 
that h(w,) = h(w,) > 0 for every k. Then, by interchanging corresponding 
members if necessary, we may assume that 


(24) d(w,) = — d(w,) < 0 for every k. 


Consider the two points w; , w, in 2 with arbitrary but fixed k. Writing wz , «, 
for w:, we respectively, and using conditions (ii), a simple calculation shows 
that the set defined by (16) is 


(25) Ay = {v:%, — % > L}, 


L being a strictly increasing function of X. 
Choose and fix an arbitrary decision function f(v) # f’(v). Comparing (5) and 
(25), it follows from Theorem 2(iii) and (23) that 


(26) r(f° | wn) = r(f| we) < max [r(f | wx), r(f | on). 
Clearly, f(v) cannot be uniformly better than f°(v) in 2. Again, from (26), 
(27) r(f° | wx) < sup [rf | »)], 


so that, since k is arbitrary, 


(28) suplr(s° | w)] = lim r(f°| we) < sup Ir(f | )]. 


Since f(v) # f'(v) in the preceding argument is arbitrary, we have shown that 
(a) no f(v) can be uniformly better than f°(v) and (b) sup [r(f°| )] = inf sup 
w tf o 


r(f | w)], ie. that f°(v) is admissible and minimax. The last part of the theorem 
follows upon setting w, = w in (27). This completes the proof of Theorem 3. 
The conditions on @ for f°(v) to be the unique minimax decision function may 
be written as follows: 
There exist w = (m,,mM2:0), wo = (1, ue 20) in Q such that 








(i) r(f° | wo)(=r(f° | wo)) = sup [r(f° | w)] (40, @), 
we 
(29) Gi) uw. = m + (= - ~ (m; —m2), we = mM +(™ . *) (m; — ma), 


(iii) A(wo) = h(wo). 


For the important risk functions (1) and (6), (29) (ii) implies (29) (iii) (i.e. h(w) 
depends on | m — mz, | alone). Moreover, when n; = nz , (29)(ii) becomes pi = 
M2 , Ke = m,. Thus for (1) and (6), when n; = nz the conditions (29) reduce simply 
to the condition that at least two points in Q at which the risk for f’(v) is maximum 
be image points of one another in the plane {w: m, = mz}. In particular, it follows 
that if ni = mn, and if the given set Q is “symmetric” in the sense that whenever 
(m: , M2 : 0) is in Q then (m2, m : a) is also in Q, then f’(v) is the unique minimax 








PROBLEM OF GREATER MEAN 479 


decision function provided that it attains its maximum risk in Q, the risk function 
in question beging (1) or (6). There are obvious modifications (involving two 
sequences of points in 2) of these remarks which assert that f°(v) is at least an 
admissible minimax decision function in case f°(v) does not attain its maximum 
risk in Q. 

We shall now state the result analogous to Theorem 3 for the case when one 
of the means is known exactly, say m: = c. The decision function f?(v) is defined 
by (3). 

THEOREM 4. Suppose that there exist sequences {w,}, {wx} of points w. = (c + ar, 
Ct ox), o = (c — a , ¢: ox) tn QD such that 


(i) lim r(fe | ox) = sup [r(fe |]. (#0, ») 


(ii) h(wx) = h(wx) for every k = 1,2,-*°. 


Then f¢(v) is an admissible minimax decision function. If there exist w = (c + a, 
c:0), w) = (c — a,c:0) ind satisfying (i) and (ii), then f2(v) is the unique minimax 
decision function. 

The proof (based on Theorem 2(iii)) is similar to that of Theorem 3 and will 
be omitted. Note that for the risk functions (1) and (6), condition (ii) is auto- 
matically satisfied. 

The reader will have observed that results which may be obtained from 
Theorem 2(iii) in the manner of Theorems 3 and 4 will assert the optimal char- 
acter of decision functions which are characteristic functions of sets of the type 
{v: ai, + bi. > c}. The following example, cited as Example (iii) of Section 1, 
shows that for arbitrary 2 the optimum decision function need not be of this 
type. 

Suppose that n; = ns = n, that Q consists of the three points 


wo = (3, — $:1), a, = (3, $:1), we = (—3, —3:1), 


and that the risk function under consideration is given by (1) or (6). Then the 
unique minimax decision function is f**(v) given by (4), where \ > 0 is deter- 
mined by 


(30) El. — f** | wo] = Elf** | wil. 


The proof follows. f**(v) is the characteristic function of the set {v: ¢(v) > 
Cidi(v) + copo(v)}, where ¢, ¢1 , ¢: are the frequency functions of the probability 
distributions in £2, corresponding to the parameter points wo , w; , #2 respectively, 
with c; = c. = e”/X. Since for all X > 0, 


E{f** | o] = Elf** | we], 


and since a unique \ > 0 satisfying (30) certainly exists, it follows (ef. (19) and 


(C)) that 


r(f** | wo) = r(f** | on) = r(f** | we) = B, 











480 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 


say. Let f(v) be any decision function 4 f**(v). We shall show that 


(31) B < max [r(f | wo), r(f | 1), r(f | w2)]. 
Suppose not. Then 
r(f | oi) = Elf | oi] < Elf** | on) = r(f* | :), 
r(f | we) = Elf | w] < E[f** | w2] = r(f** | 2). 
Then, by Theorem 1, Note 2, we must have E[f | wo] < E[f** | wo], so that 
r(f | oo) = 1 — Eff | wo] > 1 — ELf** | wo] = r(f** | wo) = B, 


contrary to hypothesis. Hence (31) holds, and since f(v) # f**(v) is arbitrary 
our assertion is proved. (Note that 


r(f° | wo) = r(f° | or) = r(f" | we) 


also, so that f**(v) is uniformly better than f°(v) in &.) We remind the reader 

that f**(v) remains the unique minimax decision function with respect to (1) 

or (6) and any 2 which contains w» , w; , #2 , and is such that sup [r(f** | w)] = B. 
we 


Whether a set © satisfies the last condition will in general depend on whether the 
risk function in question is (1) or (6). 


3. Examples and discussion. In this section we shall discuss the relevance of 
Theorems 3 and 4 to two specific problems of the greater mean. The examples 
given are purely illustrative and the reader will readily construct others in which 
the statistician is faced with similar problems of decision. 

EXAMPLE 1. A farmer F has tested two varieties 7, 72 of grain in a field 
experiment in which 7; plots were assigned to 7; ,7 = 1, 2, all plots being of equal 
area. The plot yields obtained were yn, Yi, °° 5 Yin, ANd Yor, Yor, *** » Yore 
bushels respectively. F gives this data to a statistician S for analysis. F is willing 
to assume that the yields per plot for each of the two varieties are normally dis- 
tributed with unknown means y;, we and a common variance, also unknown. 
F says he is particularly interested in whether the two varieties are “‘significantly 
different.” 

S is well aware that F’s interest in the varieties is not purely scientific—that 
is to say, F did not perform the field experiment for the sole purpose of estimating 
the unknown parameters or testing hypotheses concerning them. S also knows 
that it is very unlikely that yu, is equal to pe. 

Suppose that in fact F wishes to decide which variety he should use next 
year on his land in order to make the maximum possible profit, and is afraid 
that if he were to act as if the observed mean yields 7; , 7. were the true popula- 
tion mean yields, he might make a gross error. So F is willing to compromise 
between the two varieties (that is, he will assign some fraction f of his land to 
a, and the rest to 72) in case S declares that there is no evidence of the two varie- 
ties being different. 



























PROBLEM OF GREATER MEAN 481 


If this is the case, S should ask F how much it costs him to use 2; and the 
price at which he expects to sell his grain. Supposing that these quantities are 
a; dollars per acre and b dollars per bushel respectively, and that the area of each 
plot in the field experiment was c acres, S will set 


m; = expected profit per acre in using variety 7; 
= (b/c)u; — a; dollars (¢ = 1, 2), 
w = (m,, m2 :c), 0 being the variance of the profit per acre 
in using 7, (¢ = 1,2), 
r y(w) = (m, , m2) (see Section 2, (C)), 
viz = (b/e)yi; — a3, F; = nz x Lij, v = (41, °°* , Lin, 3a, °°* 5 Lene), 


so that r(f | w) is given by (1) and is equal to the expected loss (in terms or profit 
) per acre) incurred by using the proportions f(v), 1 — f(v) of the varieties 7, , m2 
as compared with using the variety with the greater mean for the whole of the 
land. Then if S is satisfied that the set 2 of possible points w satisfies the condi- 
tions of Theorem 3 he should recommend that F use 7 alone if 7, > %, and 
m2 alone if Z; > Z,, this being the safest procedure in the sense that it is the 
minimax strategy (cf. Example 1 in [8}). 


We shall illustrate by a simple example the obvious method of verifying 

whether f’(v) is the minimax decision function for a given 2. We have by (22), 
using the risk function (1) obtained by setting y(w) = (m, me), 

(32) r(f°|w) = h(w)G(— | dw) | ) 

| 


-} 
= |m: — m1 G(—( 1 + +) | m, — m, | /c). 


[ay & 


Now suppose that 


a={ora-Smsa+h, 

(33) - 
; b—b<msbtbia-p<aK<ah, l>\|a-bl, 
3 


where a, b, l, oo , p(>0) are certain constants. By (32), the maximum risk occurs 
at some points in 2 for which o = oo. We have 


t 
3 
. (34) Hf |e =) = oo(2 +2) -teG(—2), 
r Ni Ne 
7 where 
9 


-4 
7 = x0) = (2 :) |m, — m, | /ao. 








482 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 





If a = band n, = n2 we see from the remark following (29) that °(v) is the unique 


minimax decision function. Suppose therefore that a ~ b or nm; ¥ ne or both. 
Now 


(35) sup [~G(—2x)] = rG(—2x) = .1700 (approx.), 
where 2 = .7518 (approx.). If m;, m2 were unrestricted, r(f° | ¢ = oo) would 


3 
be a maximum when | m — mz | = sute( + oo > , by (34) and (35). Hence f°(v) 


Ny 
will be the unique minimax decision function if these two lines intersect the square 
5 > 


{a _ : <m<sat+ 4 b — J <m << b+ } in such a way that at least two 


points lying on these lines and in the square satisfy (29)(ii). This will be the case if 





| — 
(36) 1 > max||a—b| + y,max(ja—b|,y) +|™ — wo, 


| M1 + Ne | ° 
1, 1\ 
Yo = Logo | — +—). 
Ny The 
< m: or Mm > Mm 


We have assumed that 1 > | a — b| , for otherwise either m, < 

for all w in Q, and there is no problem. It is therefore clear that for nm and n2 
sufficiently large, f’(v) will be the unique minimax decision function. That (36) 
is not a very strong requirement may be seen by setting a = b, m = 2ne, in 
which case (36) reduces to 


where 


1 ay 

L>ol—+— (approx.). 
nN Ne 

We remark that f’(v) remains the unique minimax decision function for any 

Nm, N2 “when! = ~” so that Q is given by 


(33’) Q= fw: —-2 <m < ©,— © < Mm < D:0)—p <a K< ao}. 


It is of interest to consider the “one sample”? case when one of the means is 
known, say m, = c. This will be the case (approximately) if 72 is a standard 
variety which has been in use for some time and 7; is a new variety. The analogue 
of the parameter space discussed above is then 


if l 


. a. : 4.4 | 
(37) Q= so:m=c,a—5Sm<atsio—pSesom, 5>la-—cl. 
\ < ~_ ~ 


° mn . ios 7 eas ‘ aa 
By using Theorem 4 it can be seen that f-(v) as defined by (3) is the unique mini- 
max decision function if ¢ = a or if c is not necessarily equal to a, but 


l 1\) 
(38) = — | = c | > ooto\— }, 


Ny 


{ 


where 2 is given by (35). Since the left-hand side of (38) is positive, it is clear 
0 ‘7 ‘ oi — . . ‘ ae 
that f-(v) will be the unique minimax decision function with respect to (37) if 








PROBLEM OF GREATER MEAN 483 


m is sufficiently large. Note that f¢(v) is the unique minimax decision function 
for any nm; when! = o and Q@ is given by 


(37’) Q = {wim =c,—© <m < ©:09— pKa K< op}. 


The reader may find it instructive to consider other plausible sets 2 which 
satisfy the conditions of Theorems 3 and 4 and also some which do not, assuming 
o = 1 for simplicity. It should be observed that no matter what 2 may be, pro- 
vided only that o < o for all w in Q, we shall have by (32) and (35) 


3 
sup [r(f*|)] < 1700-64-( 4 + 2) (enpeex.). 


wy Ne 


In a similar way it can be seen that for any Q in which mz equals c and o < a» 
y 


3 
sup [r(fe | #)] < 1700-e»-( +) (approx.). 
wet n1 

EXAMPLE 2. 7 and 72 are two soporific drugs, the random variables generated 
by them being the duration of sleep induced by a standard dose in an individual 
chosen at random. It is assumed that these two populations are normal with 
unknown means m;, m2 and a common variance o’, also unknown. In a series 
of independent trials in which m individuals received the first drug and nz the 
second, the outcome was v = (an, Ti, *** , Lin, } Xa, Xo2,°** »Len.). The 
statistician S is required to say which is the more effective drug. 

Here a reasonable risk function is (6), where f(v) takes on only the values 
0, 1, corresponding to the decisions “m, < m2” and “m, > m2” respectively.‘ 
The problem of choosing f(v) so as to minimize this risk was considered by Simon 
[4]. He showed that in case n; = no, f°(v) is the uniformly best decision function 
in the class of symmetric decision functions. (Given nm; = nm, = n, a decision 
function f(v) is said to be symmetric if f(a, Ti, +++ 5 Lin 5 Vor» Lox, °° * » Len) = 
1 — f(aer, 2, °** 5 Von 3 Xu, Xie, °** , Lin). See also [3].) It is natural to confine 
oneself to the class of symmetric decision functions when the sample sizes are 
equal, but under the implicit assumption that if w = (a, b: c) isa possible param- 
eter point, then w’ = (b, a: c) is also (ef. the remarks following (29)). The 
illustrations in Section 1 show that if the sample sizes are unequal or if Q is not 
symmetric in the sense just described, there may exist decision functions which 
are uniformly better than f'(v): in (i) we have a “symmetric” @ but nm ¥ ne ; in 
(ili), 21 = 2 but Q is not “symmetric.” 

However, f'(v) is an admissible minimax decision function no matter what 
the sample sizes, provided only that Q satisfies a certain not too restrictive con- 
dition. We have 


(a(- | d(w) | ) for m ¥ m2, 
lo for m, = m2. 


(39) F(f?|w) = 


‘ For some purposes it would be more appropriate to take (1) as the risk function for this 
problem, letting the decision functions f(v) take on only the values 0 and 1. We have (essen- 
tially) discussed this case in the previous example. 








484 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 
It is clear that if {w,} is a sequence of points in 2 such that 


lim d(.) = 0, then lim #(J°| ex) = * = sup [F(f? | «)]. 

=e 0 ~ weQ 

Therefore, by Theorem 3, f’(v) is admissible and minimax if some point in the 
plane {w: m; = m2} is an interior point of the set 2 of possible parameter points 
(in fact it is sufficient if some plane ¢ = oo(>0) intersects Q in a set which 
has an interior point on the line m, = m2). Hence if nothing much is known 
about the two drugs, S could regard the foregoing as a justification for asserting 
“m, > m,” if # > and “m, < m,” otherwise. 

We have given no criterion for the choice of a suitable decision function when 
two or more admissible minimax decision functions exist, and our diffidence in 
recommending the use of f°(v) in the present case is due to the fact that under 
the condition stated above there will exist decision functions other than f°(v) 
which are also admissible and minimax with respect to (6). Let us suppose that 
Q is given by (33). Then f’(v) is admissible and minimax, by the preceding para- 
graph. However, it follows from Theorem 4 that each of 

; 1 if % > c, (0 if @ > ce, 
fe,(v) = and = fo*(v) = 3 


\0 otherwise, \1 otherwise, 
is also admissible and minimax, where ¢, and c2 are arbitrary constants with 
max [a, b] — ; <4, ¢2 < min f{a, b] + 5. 

There is, however, some reason for preferring f’(v) to other decision functions 
in the present case. S has been asked to give his opinion as to which is the better 
drug, and presumably no immediate consequences follow from the opinion which 
he might express. (This would not be the case if there were a sleepless individual 
on hand who had to be given a dose of one of the two drugs. Cf. footnote 4.) 
Although the problem 7s of a scientific nature, insistence upon literal exactitude 
in the interpretation of “‘incorrect decision” is meaningful only insofar as it is 
compatible with the physical situation. In view of the limited determinacy of 
unknown parameters in general, and of the limitations of experiments on soporific 
drugs in particular, it may be possible and even desirable to modify (6) in such 
a way that for any fixed o the risk tends to zero with | m,; — m, |. Thus modified, 
the risk function would be essentially similar to (1). A rather drastic way of 
introducing this modification would be to agree that the assertion of equality 
of the two means does not constitute an error in case | m; — m2| < ¢, where e is 
some positive constant. S will then take 

(F(f | w) if |m, — m| > «, 
(40) F(f|w) = § 
\0 otherwise, 


as the risk function. (Note that in using 7.(f | w) rather than 7(f |), S has in 
effect deleted the set {w: | m: — m.| < e} from the given set Q by defining y(w) = 


PROBLEM OF GREATER MEAN 485 ~ 


(0, 0) there, instead of only when m; = mz as in the case of 7(f | w). Cf. “zones of 
indifference,” [5, pp. 27-30]). It follows from Theorem 3 that f°(v) is the unique 
minimax decision function with respect to (40) and (33) if a = b and nm, = nz 
and also if at least one of these conditions does not hold but 

rm —- NM | 
M1 + Ne re 


Thus f'(v) will be the unique minimax decision function no matter what n, , 
nm, , a, b or | may be, provided only that ¢ is sufficiently small. We shall leave 
other modifications of 7(f|w) and discussion of 7(f|w) with respect to other 
types of parameter spaces (e.g. (37)) to the reader. 

We conclude this discussion with a remark on the proper choice of m and nz 
in using f°(v) when the risk function belongs to the class R defined in Section 
2, (C). (The risk functions (1) and (6) belong to R.) Suppose that before experi- 
mentation starts, it is agreed that one must have n; + ne = 2k, where k is a 
fixed integer. In that case, choosing n; = n. = k will be the best choice of n; , 
ne in the following sense. (a) For any fixed w, r(f° | w), which is the expected loss, 
then becomes a minimum. This follows immediately from (22), since 


1 > max||a — | +6 max (la —6),0 + 








- 

r(f" |e) = h(w)G(—|d(w)|),  |d()| = (2 i +) |, — ma /s, 
and | d(w) | has its maximum when n; = n, = k. (b) For any fixed w, the variance 
of the loss also becomes a minimum. In using f(v), the loss takes the values 0 
and h(w) only, with P(loss = h(w)|w) = G(— |d(w)|) = a say. Therefore, 
the variance of the loss is h’a(1 — a). Since a < }, this expression increases with 
increasing a, and so has its minimum when n; = nz = k. This remark is, of course, 
without prejudice to the question of whether f°(v) is admissible and minimax with 
respect to a given Q for every mn; and ne with ny + nz, = 2k. 


4. A remark on randomized decision functions. In the foregoing discussion 
we have confined attention to the class of non-randomized decision functions: 
the space of possible decisions being some subset of 0 < f < 1, the statistician 
constructs (in advance) a suitable decision function f(v), obtains a particular 
sample point v by sampling the two populations, and takes f(v) as his decision. 
It is, however, of some theoretical interest to consider more general formulations 
in which the decision arrived at by the statistician may be a random function 
of the sample point v. 

A randomized decision function can be defined in several ways. One definition 
is as follows. Let ¢(z | v) be a function defined for all v in Ey and all real z such 
that for any fixed z it is a measurable function of v, and such that for any fixed 
v it is the distribution function of a random variable with values in 0 < z < 1. 
We shall denote this random variable by Z,(v) and call it a (randomized) decision 
function. In using it, the statistician first obtains a particular point v by sampling 
the two populations, then performs a random experiment whose outcome Z 








486 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 
has the known distribution function P(Z < z) = $(z|v), and takes Z as his 
decision. The class of all decision functions corresponding to all functions ¢(z | v) 
will be denoted by {Z,(v)}. It is clear that this class includes the class of non- 
randomized decision functions. 

This definition of the structure of randomized decision functions follows the 
method described by Halmos and Savage in their interesting remarks ((6], pp. 
239-241) on the value of sufficient statistics in statistical methodology. For 
any Z,(v), we have 















P(Zs(v) < z| w) P(Z,(v) < z| w, v) dK(v | w) 


EN 


(41) 
o(z | v) dK(v | w). 


EN 





We shall now show that in all problems of the greater mean in which the 
methods of Section 2 can be applied to non-randomized decision functions, ran- 
domization cannot be recommended. More precisely, the following holds. 

Tueorem. Let f(v) be a non-randomized decision function which takes on only 
the values 0 and 1 and which is the unique non-randomized decision function whose 
expected value E|f | w] satisfies a certain condition Q as a function of w. Then f(v) 
is the unique decision function whose expected value satisfies the condition Q; i.e. if 
Zs(v) is a decision function such that E|Z, | w] satisfies Q, then 


(42) P(f(v) = Zg(v) |w) =1 forall. 


It follows in particular that Theorem 2 remains valid with the arbitrary non-random- 
ized f(v) replaced by an arbitrary Z,(v), and in consequence, Theorems 3 and 4 
remain valid when the class of decision functions in question is {Z,(v)}. 

Proor. Let Z,(v) be a decision function whose expected value satisfies the 
condition Q. Now, by (41) and Theorem 5 of [7] we have 













(43) EiZ.\el= | fo) aK@|) = Elf lal, 
EN 

where 
1 

(44) fle) = [ zd.o(z|v), O<f*e) <1. 
0 


It is clear from (43) that E[f? | w] satisfies Q and so we must have 
(45) f?(v) = flv) ae. 


by hypothesis. Since f(v) takes on only the values 0 and 1, it follows from (44) 
and (45) that 





[ _ d.d(z|v) = lae., 
“{z=f(v)} 












PROBLEM OF GREATER MEAN 487 


which implies (42). In order to verify the last part of the remark, consider any 


particular problem of the greater mean. The risk function of any decision func- 
tion Z(v) is, by (15), 


r(Zy |) = Wa, E[Z, | «)). 


Hence a condition on the risk function of Zs is equivalent to a condition on 
E[Z, | w] as a function of w, and the truth of the remark follows by appropriate 
definition of the condition Q in terms of the risk function. 


REFERENCES 

[1) A. WaLD, ‘Statistical decision functions,’”’ Annals of Math. Stat., Vol. 20 (1949), pp. 
165-205. 

[2] J. NeyMAN AND E. S. Pearson, ‘‘Contributions to the theory of testing statistical hy- 
potheses,’’ Stat. Res. Memoirs, Vol. I (1936), pp. 1-37. 

[3] R. R. Bawapour, ‘On a problem in the theory of k populations,’’ Annals of Math. Stat., 
Vol. 21 (1950), pp. 362-375. 

[4] H. A. Simon, ‘Symmetric tests of the hypothesis that the mean of one normal population 
exceeds that of another,’’ Annals of Math. Stat., Vol. 14 (1943), pp. 149-154. 

[5] A. Wap, Sequential analysis, John Wiley and Co., 1947. 

[6] P. R. Haumos anv L. J. Savace, ‘‘Application of the Radon-Nikodym theorem to the 
theory of sufficient statistics,’’ Annals of Math. Stat., Vol. 20 (1949), pp. 225-241. 


[7] H. Rossrns, ‘‘Mixture of distributions,’’ Annals of Math. Stat., Vol. 19 (1948), pp. 360- 
369. 











ANALYSIS OF EXTREME VALUES 
By W. J. Dixon! 


University of Oregon 


1. Introduction. It is well recognized by those who collect or analyze data 
that values occur in a sample of n observations which are so far removed from 
the remaining values that the analyst is not willing to believe that these values 
have come from the same population. Many times values occur which are ‘“du- 
bious” in the eyes of the analyst and he feels that he should make a decision as 
to whether to accept or reject these values as part of his sample. On the other 
hand he may not be looking for an error, but may wish to recognize a situation 
when an occasional observation occurs which is from a different population. 
He may wish to discover whether a significant analysis of variance indicates an 
extreme value significantly different from the remainder. Also, of course, the 
extreme value may differ significantly without causing a significant analysis 
of variance and he may wish to discover this. It is reasonable to suppose that a 
criterion for rejecting observations would be useful he .2 also. The choice of a 
suitable criterion for rejecting observations introduces a number of questions. 

1. Should any observations be removed if we wish a representative sample in- 
cluding whatever contamination arises naturally? In other words, it may be 
desirable to describe the population including all observations, for only in that 
way do we describe what is actually happening. 

2. If the analyst wishes to sample the population unaffected by contamination 
he must either remove the contaminating items or employ statistical procedures 
which reduce to a minimum the effect of the contamination on the estimates of 
the population. That is, he may wish to describe only 95% of his population 
if the description is altered radically by the remaining 5% of the observations. 
He may have external reasons which are good and sufficient for wishing to de- 
scribe only 95% of his observations. Suppose he wishes to use the:sample for a 
statistical inference; the inclusion of all the data may sufficiently violate the 
assumptions underlying the inference to exclude the possibility of making a valid 
inference. 

This paper will concern itself only with those problems which arise from Ques- 
tion 2. . 

If we wish to follow some procedure which attempts to remove contamination 
we must consider the performance of any proposed criterion with respect to the 
proportion of contamination the criterion will discover and, of course, the propor- 
tion of the “good” observations which are removed by the use of the criterion. 
But, perhaps more important, we must consider what sort of bias will result 
when the standard statistical procedures are applied to samples of observations 
which have been processed in this manner. 


1 This paper was prepared under a contract with the Office of Naval Research. 
488 


EXTREME VALUES 489 


If we wish to follow a procedure which will not search for particular values to 
be excluded but will minimize their effect if present, we must investigate the 
sampling distributions of these modified statistics and estimate the loss in in- 
formation resulting from their use when all observations are ‘‘good.’’ We must 
also investigate the expected bias which will result when ‘‘bad”’ items are present 
even though essentially excluded. Perhaps most disturbing about the avoidance 
of “bad” items is the fact that a decision must still be made as to whether a 
“bad” item was present or not in order to know in which way our estimates may 
be biased. For example, a sample mean computed by avoiding the two end ob- 
servations will not be a biased estimate of the mean of a symmetric population 
if both end items should actually be included or if both end items should not be 


included. However, if only one of the two should not be included this estimate of 
the mean will be biased. 


2. Models of contamination. The performance of the various criteria for dis- 
covery of one or more contaminators will be measured with reference to con- 
taminations of the following two types entering into samples of observations 
from a normal population with mean yz and variance o°, N(u, o°) 


A. One or more observations from N(u + do, 0°), 
B. One or more observations from N(, do’). 


A represents the occurrence of an “error” in mean value such as will occur in 
dial readings when errors are made in reading incorrectly digits other than the 
last one or two digits. Errors of this sort may result from momentary shifts in 
line voltage or from the inclusion among a group of objects of one or two items 
of completely different origin. This type of contamination will be referred to as 
“location error.” B represents the occurrence of an “error” from a population 
with the same mean but with a greater variance than the remainder of the sample. 
This type of error will be referred to as a “scalar error.” It is likely that many 
errors could be better described as a combination of A and B, but a study of these 
two errors separately should throw considerable light on the question of ‘gross 
errors” or “blunders.” 

Many authors have written on the subject of the rejection of outlying observa- 
tions. Apparently none have been successful in obtaining a general solution to 
the problem. Nor has there been success in the development of a criterion for 
discovery of outliers by means of a general statistical theory; e.g., maximum 
likelihood. A large number of criteria have been advanced on more or less intui- 
tive grounds as appropriate criteria for this purpose. In no case was investigation 
made of the performance of these criteria except for a few illustrative examples. 

References for the criteria discussed in the next section are given at the end 


of this paper. Indications are given as to the significance values available in 
those papers. 








490 W. J. DIXON 


3. Criteria to be considered. The performance of two types of criteria has 
been investigated for samples contaminated with location or scalar errors. 


a) o known or estimated independently, 


b) o unknown. 


The n observations are ordered 2; < a2 < +--+ < 2,. The criteria involving 
external knowledge of o are: 


A. x’ test, 
2 r(x = )’ 
iaalieeee geet 
o 


B. Extreme deviation, 


Zn — £ I-72 
B, = = or rh, 
o o 











B = Ln — Ln-1 (or m= *) 
Co oC 
C. Range, 
j=". W=%,—-X, 
oC 
) > == 2 
C, = <, 2 = _—- (s independently estimated). 


The criteria involving only the information of a single sample of n observations 
are: 


D. Modified F test. 
1. For single outlier 2, , 


2 n n 
D, = a where S = >> (« — z,)? % = do 2/(n — 1), 
is” 2 2 


HR 
to 
ll 
—) 
—_ 
8 
& 
wa 
_& 
Si 
ll 
~M> 
&® 
bes. 
= 


(or for 2n, D; = se). 


2. For double outliers 2; , x2 , 





2 n % 
Si o2 = 2 y : 
Ds» = 2 y where 91,2 _— Zz (x tit X1,2)°, “2 = x/(n—2) 
2 3 3 
2 
n,n—1 
(or for %,, Xn-1, Dz = 2 ?. 


E. Ratios of ranges and subranges. 


1. For single outlier 2 , 


EXTREME VALUES 491 


ta Ty 
T1 =S—lU 
Ln = V1 
Lan — Lai 
¢ for a, 70 = ———— }. 
Ln — V1 
2. For single outlier x; avoiding 2, , 
ta — 11 
= 
La-1 — 2D 


~ In — Tr— 
( for rt, avoiding 2, 7 = ————— }. 
In — Xe 


3. For single outlier x; , avoiding 2, , Xn-1, 


Le — Ty 
Tn-2 — TX 


rr = 


” In — 2 
or for z, avoiding 2, %2, f12 = ————— }. 
Ian — 3 


4. For outlier x; avoiding zp , 


_ %% — XY 


7a = 
In — % | 


se Xn == Ln—2 1 
or for x, avoiding 2,1, To = ————— }. 
Xn sia i v1 


5. For outlier x; avoiding zx, and 2, , 
wz — 
721 = as 
Ln-1 — 


° ee - In — Ln-2 
or for 2, avoiding 2a-1, %1, f = ———— }. 
Ln ian Lo 


6. For outlier x; avoiding x. and 2, , Xn-1, 





+4: Un — Ln-2 
or for x, avoiding 2,1 , % , 22, %2 = —— : 
In — 2&3 
I’, Extreme deviation and standard deviation. 
For single outlier 2, , 


In — £ Z— 2% 
Pose ——— (or for a, 7 = 2=**). 
s s 


The performance of the large number of criteria listed here will be assessed 
with respect to discovery of contamination of the type given in Section 2. 








492 W. J. DIXON 


4. Performance of criteria (estimate of o available). The x? test will of course 
give an indication of a large dispersion and since the extreme values are chief 
contributors to the sum of squares, it is possible to use this test as a criterion for 
rejecting a value or values which are at the greatest distance from the mean. 
It might be supposed the B, and B, would give better results since particular 
attention is paid to the end item. The same argument would influence one in 
favor of C; or C, . The performance of C2 can, of course, be expected to vary with 
the degrees of freedom in the independent estimate of co. For this study the de- 
grees of freedom for this estimate were held to the single value 9 df. 

x’ may be used since if the value of x’ is too large (greater than some upper per- 
centage point for x’) we might reject the value most distant from the mean. 
x tables may be used for percentage points. Percentage points for the other 
statistics considered here are given in the references at the end of this paper. 

The criteria A, B,, B,, C, , C2 were investigated for a = 1%, 5% and 10% 
for \ = 2, 3, 5, 7, where one or more items are selected from a population N(u + 
\o, o) and the remainder from N(u, o”). Investigations were also made for one 
item from N(u, d*o") for \ = 2, 4, 8, 12. The investigation was carried out by 
sampling methods. The performances of different criteria were assessed for the 
same group of samples in order to obtain more precision in the comparison of the 
different tests. All of the points appearing on the graphs in the subsequent sec- 
tions of this paper were based on from 66 to 200 determinations. 

The performance of the above criteria is measured by computing the propor- 
tion of the time the contaminating distribution provides an extreme value and 
the test discovers this value. Of course, performance could be measured by the 
proportion of the time the test gives a significant value when a member of the 
contaminating population is present in the sample, even though not at an ex- 
treme. However, since it is assumed that discovery of an outlier will frequently 
be followed by the rejection of an extreme we shall consider discovery a success 
only when the extreme value is from the contaminating distribution. 

The performance was judged by applying the criteria to each sample, always 
suspecting an outlier in the direction of the shifted mean for location error. 
Since the location errors were inserted by adding a fixed value to one or more 
of the observations, the largest value was tested as an outlier. The measure of 
performance was the percentage of location errors identified. When the location 
error was not an outlier, no test was performed and a failure for the test recorded. 

In the case of the model of contamination involving the scalar error, the value 
was suspected which was farthest from the mean. This of course, alters somewhat 
the level of significance, but this procedure was followed alike for all criteria 
investigated. The performance was measured in the same fashion as for location 
errors. 

Considering first, location errors, a study of the performance curves showing 
the per cent discovery of contaminators plotted against \ (the number of standard 
deviation units the population of contaminators is removed from the remainder), 
shows that the level of performance for o known is considerably above the level 








EXTREME VALUES 493 


of perforraance when ¢ is not known. The difference is greater for n = 5 than 
for n = 15 and, of course, the difference will diminish as the sample size increases. 
Figure 1 shows the performance curves for a = 5% (5% significance level for 
the test for an outlier) of B} = (x, — %)/o for n = 5 and n = 15 and of ry = 
tm = Fel for n = 5 andn = 15. 
In — U1 

The graphs for a = 1% and 10% would be similar in appearance. Figure 2 
indicates the change in performance for a = 1%, 5%, and 10%. The curves 
plotted are for B; = (x, — £)/o. The curves for A, Bz , C; , C2 show very similar 
results. 

The curve for test B, was used in Figures 1 and 2 since it gives the best per- 
formance of all criteria which are considered here if a single location error is 
present. The curves showing the comparative performance of these criteria as 


2 








Fic. 1. Improvement in performance ob- Fic. 2. The effect of the level of signifi- 
tained with knowledge of ¢, a = 5%,n = 5, cance on the performance of B; ; a = 1%, 
15. 5%, 10%; n = 5, 15. 


well as one to be considered later (rio) are given in Figure 3 for a = 5% and for 
n = 5andn = 15. 

The following statements can be made from inspection of Figure 3: 

a) The differences among A, B, , Bz, and C;, are not great. 

b) The knowledge of ¢ is less important in larger samples. 

c) The curve for C2 lies above that of rio for m = 5 and below that of rio for 

n = 15. This is consistent with the use of 9 df. in the independent estimate 
of o. 

If the question of ease in computation or application is important, it may be 
desirable to use B, or C; in place of B, for they are slightly easier to compute 
and it is not necessary to measure all observations to obtain the value of these 
statistics. From Figure 3 it will be noted that the performances of these criteria 
are nearly as good as for B, . If two outliers may be expected in a single sample, 





494 W. J. DIXON 

















Fic. 3. Comparison of the performance of criteria using ¢ known (or using external 
estimates of o) and rio for samples of size 5 and 15, a = 5%. 


the performance of B, will be lowered and the performance of B; and C; will be 
improved. Any differences between the performance of B, and the performance 
of C, when two outliers are present was not discernable for n = 5 or 15. Figure 4 
illustrates the improvement in performance for B, for a = 5% and n = 15. 

The performance curves of these criteria if a scalar error is present are very 
similar to those above except that: 

1. A high level of performance is approached very slowly. For example, see 
Figure 5 showing the performance of B, and ry» forn = 5andn = l5anda = 5%, 

2. There is a smaller difference in the performance between the criteria with 
o known and o unknown (see Figure 5). 

The performance of B, and C;, are noticeably increased by the introduction 
of more contaminators while that of B, decreases. No difference in the perform- 


o 


o 
4/900 





ae — 
ONE ERROR | 
TWo Errors ---/ 
/ 
/ 


75 y 


/ 
50 ey 





oe J 2 sos ££ S$ & 7? 8S 


Fic. 4. Comparison of the performance of B, for one and two location errors in samples 
of size 15,a = 5%. 











EXTREME VALUES 495 


ance of B, and C,; were noted for either n = 5 or n = 15. Figure 6 shows the in- 
crease in performance of two contaminators for B, for n = 15, a = 5%. 

The general recommendations for possibilities of either type of contamina- 
tion, location or scalar errors, would lead one to the use of B, or C; if o is known. 

Criterion C, is recommended since: 

1. Its performance is almost as good as the performance of B, for a single 
outlier. Their performances are about equal for two outliers and C, affords pro- 
tection for outliers either above or below the mean. 

2. It is simple to compute. 

If ease of computation is not essential and maximum performance is desired, 
the criterion B, should be used. The performance of C2 will approach that of 
B, as the number of degrees of freedom in the denominator increases. 


O a ro 
OWE ERROR 1— 
TWO ERRORS —-— 














Fie. 5. Comparison of the performance of Fic. 6. Comparison of the perfo: mance 
B, and rj for one scalar error for samples of B, for one and two scalar errors in samples of 
size 5 and 15, a = 5%. size 15, a = 5%. 


5. Performance of criteria (no external estimate of oc). Criteria D,; and D, 
have strong intuitive reasons for their use since the dispersion is estimated by 
s’. The r ratios are attractive because of their simplicity and their preoccupation 
with the extreme values. Test F is the ‘‘studentized”’ ratio corresponding to B, , 
and is equivalent to D; since D; = 1 — F’/(n — 1). There is no apparent dif- 
ference in the performance of D,; and ry when one outlier is present and no 
apparent difference in D2. and ra when two outliers are present. This is true for 
both models of contamination and for the three levels of significance investigated. 
However the comparison of D, and ra was made only for n = 5 since critical 
values are not available? for D. for n = 15. (Critical values are available for 
n < 12.) 

The performance of D,; and rj under the two models of contamination can 
be obtained by reference to the curve for 73 in Figure 1 and Figure 5. The curve 
for D, is practically identical with the curve for rp . 


2 After this paper was submitted, the critical values of D, have been extended to n < 20 
(see references). 








496 W. J. DIXON 


There is no question that ri is simpler to use, so that if this condition of 
contamination (scalar errors) exists, 71) would probably be chosen. However, as 
before, we should investigate what happens when more than one error is present. 
D, is designed for this case as is ro . Since the performance of these two criteria 
is approximately the same, r2) would probably be chosen because of its simplicity. 
Critical values for this statistic are available for n < 30. 

Ti, Ti2, 720, T21, T22 Were designed for use in situations where additional out- 
liers may occur and we wish to minimize the effect of these outliers on the in- 
vestigation of the particular value being tested. 

It has been suggested that D, could be used repeatedly to remove more than 
one outlier from a sample. This procedure cannot be recommended since the 
presence of additional outliers handicaps the performance of both D, and ry 
for small sample sizes and therefore the process of rejection might never get 
started. For larger sample sizes the performance of D, is affected much less by 
the presence of two errors than is the performance of rj . The repetitive use of 
D, is not recommended in this case either since ry) performs in a superior man- 
ner to D, in such situations. This difference in performance of D, and ry de- 
pends markedly on the level of significance used as well as the sample size. 
For small samples there is little difference in performance for any of the levels 
of significance one might use. For the larger sample sizes there is no appreciable 
difference for very high levels of significance. The difference is however very 
great for lower levels of significance. In fact as \ increases for two errors of the 
location type, the level of significance which divides the region of approach to 
zero performance from the region of approach to perfect performance of D, is 


given by the level of significance corresponding to a significance value of x =) 


; : lf n 15 
. 1 : ‘ ie ae oes ae 
for D,. Thus, for example, in samples of size 15, 25) = 036. 


This value lies between the values for the 2.5% and 5% level of significance. 
These values are .503 and .556 respectively. Therefore the use of the 1% or 
2.5% levels will give poorer and poorer performance as \ increases, and the 
use of the 5% or 10% levels will give better and better performance as A increases 
when two errors are present. The dividing point is such that for samples of 
size 11 or less the use of any of the given levels of significance will cause the 
performance to decrease as \ increases. For samples of size n < 14 the 1%, 
2.5% and 5% levels have the same effect, and for samples of size n < 16 the 1% 
and 2.5%, for samples of size n < 19 just the 1% level. For three such errors 


2 


the limit approached by D, as 2 increases is (-* i) Therefore, the perform- 


3 
ance of D, will approach zero for all levels of significance and for all sample 
sizes for which critical values are known except the 10% level of significance 
k-1 nn 
ko on—1 
for k contaminations present can be obtained by considering these k values to 


for sample sizes larger than 21. An indication of these limiting values 











EXTREME VALUES 


190 


75 


2s 





 JSe 3- & & G& FF SB 


Fic. 7. Comparison of the performance of Fia. 8. Comparison of the performance of 
the r criteria for one location error in ther criteria for one scalar error in samples 
samples of size 5,a = 5%. of size 5, a = 5%. 


be at a distance k from the population mean, computing D, and allowing A to 
increase indefinitely. 

The comparative performance of the r criteria, a = 5%, in samples of size 5 
for the two models of contamination (one contaminator present) are given in 
Figures 7 and 8. For samples of size 15 the curves are given in Figures 9 and 10. 
A single curve suffices here since there is no discernable difference in the curves 
for the different r criteria. There is considerable difference in the performance 
curves if more than one outlier is present. However, the performances of rio , 
Tu, T12 are essentially the same when two location outliers are present as are 
the performances of ro, 721, 722. Figures 11 and 12 show the comparative per- 
formance of rio, 71, 712 for one and two contaminators for a = 5% and n = 5. 
Figures 13 and 14 are for n = 15. Figures 15 and 16 show the comparative per- 


; | Hh 
. ATT 
A aA 


pet 





Fic. 9. Performance of the r criteria for Fic. 10. Performance of the r criteria for 
one location error in samples of size 15, a = onescealar error in samples of size 15, a = 5%. 


5%. 





498 W. J. DIXON 


oO 
ONE ERLOQ ——— ONE ERRORE -—— 
TWO ERRORS ——-- TWO FRRORS --—-=-— 








Fic. 11. Comparison of the performance Fic. 12. Comparison of the performance 
of the 7,. criteria for one and two location of the 7. criteria for one and two scalar 


FO7 


errors in samples of size 5,a = 5%. errors in samples of size 5, a = 5%. 


formance for re , 721 , (722 is not a test for nm = 5) for one and two contaminators 
for a = 5% and n = 5. Figures 17 and 18 are for ra, ra, Tox for n = 15. The 
six curves represented by the single curve of Figure 17 lie within 5% of the 
curve shown. The same is true of the three curves represented by each of the 
two curves of Figure 18. 

Since no loss in performance results for larger samples from the use of ra , 
ra, T2 in place of ri, Tu, 12, and further, these criteria are not appreciably 
affected by the presence of another outlier it would seem unwise to recommend 
the use of rio , 7, Ti2 . However, note that for small samples (see Figures 11 and 
12) the performances of 73) and ry, and ry are considerably better when a single 








“eo 
ort. .. oT "| sede ont i | | 
ONE ERROR ONE ERROR 








Two ERRORS —-— 





| 
| TWO ERRORS —--- | 


7S 





so 50 











2s ; on | &S- 


Oo 7 2S ££ SF 6 FF SB Ove 34 § 6 7 8 


Fic. 13. Comparison of the performance Fic. 14. Comparison of the performance 
of the 7. criteria for one and two location of the 7. criteria for one and two scalar 
errors in samples of size 15, a = 5%. errors in samples of size 15, a = 5%. 





EXTREME VALUES 499 












% : 3 
nent po -— 
| ONE ERROR ONE ERROe | 
| 
ce 
| 
| 
a cama 
| 
25+—_1_+- 
| } 
| ' 
| | 
Aen | 
= A 
o 7 


Fig. 15. Comparison of the performance Fic. 16. Comparison of the performance 
of the r2. criteria for one and two location of the re. criteria for one and two scalar er- 
errors in samples of size 5,a = 5%. rors in samples of size 5, a = 5%. 
outlier is present. Therefore in larger (n > 10) samples ro or 72 would appear 
to be the best criteria. In samples of size 10 or less, 719 or 72 should be used; 
ra if the extreme value at the opposite end should be avoided. 

It should be noted in the comparisons that no model of contamination was 
investigated which would cause one or more errors at both extremes in the 
sample. It is obvious that the performance of D, and D, would be considerably 
decreased while the performance of ry , riz , and ra: , 722 would not be materially 
affected since these criteria avoid values at the opposite extreme. Their repeated 


use might discover most of such outliers, while D,; or D, might fail on the first 
trial. 


4K 


JO — 





OWE EReore |\—— 
TWwo £RRoes |—-—+-— 


7S 


50 


25. 























o yest? oo @©= FF @ GC tle BF &€ 6&@ FTF BD 

Fig. 17. Comparison of the performance Fia. 18. Comparison of the performance 
of the r2. criteria for one and two locationer- of the re. criteria for one and two scalar er- 
rors in samples of size 15, a = 5%. rors in samples of size 15, a = 5%. 








500 W. J. DIXON 





























n=65 n = 15 


Fic. 19. Performance of B, for various levels of significance when the population is 10% 
contaminated with location errors. 


6. Sampling from a contaminated population. In the previous sections the 
performance of the various criteria were assessed for samples where a certain 
number of contaminators were present. One might well ask why a test is needed 
is it is known that contaminators are present. It would seem more realistic to 
state that a certain per cent of contamination will occur in the long run and 
that one will not know in any particular case whether 0, 1, 2, --- contaminators 
will be present. One would then wish a criterion to indicate the presence of 
contamination in a particular sample. 

The performances of these criteria will be investigated for the same two 
models of contamination and their performances will be reported as per cent of 























8 


n=5 n= 15 


Fig. 20. Performance of B, for various levels of significance when the population is 10°% 
contaminated with scalar errors. 


-—™ VW ome Ww wo ey CY 


EXTREME VALUES 501 








Fic. 21. Performance of B, for various levels of contamination for location errors and 
using the 5% level of significance. 


total contamination discovered. The tests will be applied only once to each 
sample. Repeated use of the criterion would in many cases increase the per cent 
of total contamination discovered. It is not known what effect such a procedure 
would have on the level of significance. 

Investigation has been made for 5, 10, and 20% contamination. For example, 
in samples of size 5 which have 10% contamination, on the average, 59.0% of 
the samples will contain no “errors’’, 32.8% will contain one, 7.3% two, 0.8% 
three, 0.1% four, and 0.0% five. Thus in 100 samples of 5 which are 10% con- 
taminated with location errors having mean ph + 5¢, about 59 contain no errors. 
If the ri criteria is used with a 5% level of significance one value will be “dis- 





Fic. 22. Performance of B, for various levels of contamination for scalar errors and 
using the 5% level of significance. 








502 W. J. DIXON 








— 1 + t 
3 4 5 6 
(Location) (Scalar) 
Kia. 23. Performance of ry , D; , rx , De in samples of size 5 using the 5% level of signifi- 
cance and sampling from a population which is 10% contaminated. 


covered”’ in 3.0 of the samples containing no errors. Of the 33 samples containing 
one “error” the “error’’ would by discovered in 18 of these samples. This criteria 
would discover none of the “errors” in samples containing more than one “er- 
ror’. We would have obtained 18 of the 50 contaminating values and 3 which 
were members of the original population. 

When o is known the performance will increase when more contaminators 
are present. Performance however has been measured in terms of finding a 
single contaminator; i.e., the test has been used only once. Therefore even with 
increasing percent contamination the level of performance will decrease with 
increasing contamination. Repeated use of the test criteria has not been in- 
vestigated. 

















oS rr s 


ree(Dy » T20 5 Tn) 
n= 15 





Fig. 24. Performance of rj(D,) and ree(Di , ro , rx) for various levels of significance 
when the population is 10% contaminated with location errors. 


EXTREME VALUES 503 


e 
oO 


7S: 








/ ee 3 @ 35S © * 
10(D,) 20(Dy , T20 , T2) 
n=5 n=15 

Fic. 25. Performance of rio(D,) and re2(D; , ro , P1) for various levels of significance 
when the population is 10% contaminated with scalar errors. 


Criteria B, gives the best performance for both location and scalar errors for 
the levels of contamination and levels of significance considered. A and C, are 
only slightly inferior. B, is handicapped when more than one error is present 
thus its performance is poorer for heavier contamination. Figure 19 shows the 
performance of B, for the different levels of significance, 10% contamination, 
and the two sample sizes 5 and 15 for location errors. Figure 20 shows the results 
for scalar errors. Figures 21 and 22 show the performance of B, for the 5% 
level of significance for the different levels of contamination. 

When o is not known the performance of various criteria will eventually 
decrease as more and more contaminators are present in the sample even though 




















r10(D,) roo Dy , T20 , 21) 
n=5 n = 15 


Fic. 26. Performance of ryo(D,) and r22(Di , ro , 721) for various levels of contamination 
for location errors and using the 5% level of significance. 








504 W. J. DIXON 


S 


7 7 7 
i _ 


490 


75 


So 


<. 
is} 
we \ 


25. 25. 








v | 
——§ et 


| | 


A 
3#4#@S5F67 8 


ee Ff £2 
T10(D;) roo( Dy , T20 y Ta) 
nr= 5 r= 15 


Fic. 27. Performance of ryo(D,) and r22(Di , ra , 71) for various levels of contamination 
for scalar errors and the 5% level of significance, a = 5%. 
70 £ ’ oO 


several of the criteria show improvement in discovering a single error if two 
are present. The performance of these criteria is greatly affected by the size 
of the sample. For samples of size 5, rio and D, perform alike, rio being superior 
to the other 7’s (ray second best) for the levels of contamination considered, 
and Dz is inferior to ra. Figure 23 compares the performance of 710, Dy, ro , 
and D, for the 5% level of significance and 10% contamination. The results 
for other levels of significance and contamination are comparable. 

For samples of size 15, 720, 7 and rez. perform alike as do ry, ru and ry. D, 
and 72, fo, 72 perform approximately the same and are superior to rw, 7u, 


° 
° 
790 








5o 


2s 





o + & 3& @# F&F 6G FP & 
Fig. 28. A comparison of the performance of rez and D, for two scalar contaminators 
when tests are made at one extreme only, a = 5%, n = 15. 


EXTREME VALUES 505 


and ry: . Critical values are not available for D. for n > 12. The performances 
of D1 , r20 , 71 and 7 are indicated by a single line in Figures 24, 25, 26, and 27 
which show the effect of level of significance and level of contamination of the 


performance of D,, 72, 72 and re for samples of size 15 and for ri (D,) for 
samples of size 5. 


7. Remarks and conclusions. Throughout the investigation of performance, 
location errors were placed only at one extreme and scalar errors at either ex- 
treme. The test for an error was made using as a suspected value the extreme 
value in the direction of the location error or in the case of the scalar error the 
value most distant from the mean. It can be expected then that if performance 
were assessed when location errors could occur in either direction, different 
results would be obtained. Also in the case of scalar errors if errors were always 
sought at one particular extreme or at both extremes different results would be 
obtained. If these changes were made in the models of contamination, those 
criteria designed to avoid errors at the other extreme would have an advantage 
over those which were not so designed for ¢ unknown. If o is known the criteria 
which do not avoid the other extreme would have an advantage over those 
which do avoid the other extreme. These points just mentioned will be used to 
discriminate between those criteria which were judged to be equal in perform- 
ance under the models used in the sampling study. For example, Figure 28 
compares the performance of rx. and D, for two scalar contaminators when 
tests are made only at one extreme, a = 5%, n = 15. 

1. For o known: 

B, or C; should be used, or in small samples A, B, or C; should be used. 

2. For o unknown: 

r19 Should be used for very small samples. 72. should be used for sample sizes 
over 15. Probably ra would be best for sample sizes from about 8 to 13. If sim- 
plicity in computation is not important and “errors” are not expected at both 
extremes D, would do equally well. When critical values are available for larger 
n, Dz should prove useful in the larger sample sizes. 


LITERATURE REFERRING TO CRITERIA LISTED IN SECTION 3 


(B,) A. T. McKay, ‘The distribution of the difference between the extreme observation and 
the sample mean in samples of n from a normal universe,’”’ Biometrika, Vol. 27 
(1935), pp. 466-471. Procedures for obtaining percentage values given. 

(B.) J. O. Irwin, “On a criterion for the rejection of outlying observations,’’ Biometrika, 
Vol. 17 (1925), pp. 238-250. Pr(Bz > d),A = .1(.1)5.0; n = 2,3, 10(10)100(100)1,000. 
Tables concerning the second and third ordered observations are also given. 

(C;) E.S. Pearson anv H. O. Hartxey, ‘‘The probability integral of the range in samples 
of n observations from the normal population,’’ Biometrika, Vol. 32 (1942), pp. 
301-310. 0.1%, 0.5%, 1.0%, 2.5%, 5%, 10%, n = 2(1)12, values to 20 available by 
interpolation. 

(C2) D. Newman, “The distribution of ranges in samples from a normal population, ex- 
pressed in terms of an independent estimate of the standard deviation,’’ Biometrika, 
Vol. 31 (1940), pp. 20-30. 1% and 5% points for C2; for w, n = 2(1)12, 20; s, d.f. = 
5(1)20, 24, 30, 40, 60, «. 








506 W. J. DIXON 


(C.) E.S. PEARSON anv H. O. Hart ey, ‘‘Tables of the probability integral of the student- 
ized range,’’ Biometrika, Vol. 33 (1942), pp. 89-99. Upper and lower 5% and 1% 
points for C2 ; for w, n = 2(1)20; for s, d.f. = 10(1)20, 24, 30, 40, 60, 120, o. 

(Co, Bi) K. R. Narr, ‘‘The distribution of the extreme deviate from the sample mean and 
its studentized forms,’’ Biometrika, Vol. 35 (1948), pp. 118-144. B; upper and lower 
1%, 5%, 1%, 2.5%, 5%, 10% points for n = 3(1)9. 

(D,, D2, F, Bi) F. E. Grusss, ‘Sample criterion for testing outlying observations,” 
Annals of Math. Stat., Vol. 21 (1950), pp. 27-58. F, Di : 1%, 2.5%, 5%, 10%, n < 25; 
Dz: 1%, 2.5%, 5%, 10%, n < 20; By: 1%, 2.5%, 5%, 10%, n < 25. 

(F) W.R. Tuompson, “On a criterion for the rejection of observations and the distribution 
of the ratio of deviation to sample standard deviation,’’ Annals of Math. Stat., 
Vol.6 (1935), pp. 214-219. 20%, 10%, 5%, n = 3(1)22(10)42, 102, 202, 502, 1002. 

(F) E.S. Pearson anp CHANDRA SEKAR give a further discussion of F in ‘The efficiency of 
statistical tools and a criterion for the rejection of outlying observations,’’ Bio- 
metrika, Vol. 28 (1936), pp. 308-320. 10%, 5%, 2.5%, 1%, n = 3(1)19. 

(r’s) W. J. Drxon, ‘Ratios involving extreme values,’’ Annals of Math. Stat., to be pub- 
lished. Tio 5 T11 5 Ti2 5 To , Ta , L225 5%; 1%, 2% ’ 5%, 10%, 20%, 30%, 40%, 50%, 


DISTRIBUTIONS RELATED TO COMPARISON OF TWO 
MEANS AND TWO REGRESSION COEFFICIENTS 


By Utrram CHanp! 


University of North Carolina 


Summary. We consider here the relative merits of different statistics avail- 
able for testing two means or two regression coefficients in relation to one-sided 
(asymmetric) and two-sided (symmetric) alternatives in case of unequal popula- 
tion variances. In so far as the Behrens-Fisher statistic is concerned we confine 
ourselves to the consideration of the behavior of its probability of Type I error 
in repeated sampling from populations with a fixed value of the unknown ratio 
of variances. In connection with the tests between two means, the present 
study takes its point of departure from the existing tests and investigates the 
question of utilizing an approximately determinate knowledge about the un- 
known ratio of variances. In connection with the comparison of two regression 
coefficients and also of two linear regression functions, we consider the effect of 
two concomitant sources of variation, viz., the unknown ratio of residual variances 
and the ratio of the sums of squares of the fixed variates, on the probability of 
Type I and Type II errors of certain well known statistics. 


‘ : ; , ‘ 
1. Introduction. Consider two independent samples x; +--+ 2,11 and 41 *+* Yag41 
: : . 2 2 
drawn from two normal populations with means m,; and m2 , variances o; and a9. 
Let K = o3/o2.I1f K is known and m, = m2, the quantity 


RQ 
| | 
n> 


- c= a) +KS(a'-2 (1, aetna 
Nm + ns m+ 1 > K(nz + 1) 


(i; is Fisher’s ¢) is distributed according to “Student’s” distribution with n; + ne 
d.o.f.” and for the “Student’s” hypothesis Ho:m, = m. provides a uniformly most 
powerful test against an asymmetric alternative H,:m, > (or <)m, and a 
type B, test against a symmetric alternative H2:m,; # m.. If K is unknown 
certain approximate and exact tests have been suggested from time to time to 
meet this situation. 

Welch [1], [2] using an approximation to the distribution of 4, was the first 
to point out that if K is unknown and we assume it to be equal to unity, then 
the probability of Type I error of the ¢,-test is subject to large variations as K 
varies from 0 to «. He also pointed out that the statistic 


ne — a)? (at en)? 
(z = [ S(a , Z) 48 (x a | 


ee Te i Vee oO 


1 Now Assistant Professor of Mathematical Statisties at Boston University. 
2 Degrees of freedom. 


507 








508 UTTAM CHAND 


which does not have ‘“‘Student’s” distribution for K = 1, has the advantage 
that its probability of Type I error is subject to less variation with respect to K. 
His approximate results were later confirmed by Hsu [8] who obtained the 
distribution of quantities u;(=t{) and u(=v") and also showed that these tests 
are unbiased in the sense of Neyman and Pearson. Hsu concluded on the basis 
of his investigations that when the sample sizes are equal and not very small, 
we may safely use u;(=w,) as if K were unity. This also had been pointed out 
by Welch. 

If on the basis of past experience some approximate value k of K were available, 
one would like to know if such a choice in some rough neighborhood of K would 
in any way improve the claim of t,(=tx for K = k) for the hypothesis m, = m. 

m(ny + ) 
N2(N2 + 1) 
will be obtained in Section 2.1. It will be shown that variation in the probability 
of Type I error of ¢, with respect to K for any k except when t, = 2, is essentially 
similar in character to that of tj [3] and is very sensitive in a neighborhood 
of K in which one would very often be interested (Section 2.4). This is also true 
of the behavior of the power function of ¢ with respect to K. Consequently a t, 
type of statistic will be unsuitable in general for utilizing an approximately 
determinate knowledge of K. 

It is not possible to infer directly from Hsu’s work on the relative merits of t; 
and v in relation to asymmetric aspects of ‘‘Student’s” hypothesis. His basic 
conclusions as regards unbiasedness and the nature of variations in Type I 
error in the symmetric case also hold for the asymmetric case except that the 
Type I variations in 4; and v are less for asymmetric than for symmetric com- 
parisons (Section 2.5 and Table II). Furthermore it appears (Section 2.5 and 
Table III) that with respect to the variations of K both the asymmetric and 
symmetric power functions of t; are likely to be more sensitive than those of v. 
Since for equal d.o.f. both the asymmetric probability of Type I error and 
power function are insensitive to the vagaries of the ‘nuisance’ parameter K, 
there is an a fortiori reason for using v(=¢,) as if K were unity. 

Scheffé [4] considered the statistic 


The distribution of this generic quantity uf =t, fork =1; =vfork = 


nit+l -\2 \—}3 
ee 
( ) ) (m < Ne), 


in @-2)(% mim ¥ 1) 





4 
(equivalent to paired difference t when n; = m2) where u; = 2; — (= = ) 2’; 
2 


and where it is assumed that the variates in each sample have been randomized. 
This is essentially a ““Student’s” ¢ comparison based on 7, d.o.f. and as shown by 
Scheffé it is impossible to get a suitable statistic with the ¢-distribution with 
more than n, d.o.f. The statistic v has the ¢-distribution only when K = «(nm 

- m(n; + 1) 
d.o.f.), K = O(m d.of.) and K = on + 0 
m ,m, K and P we can solve P = P(v > to| Ho) for t and thus indirectly obtain 


(ny + nm d.of.). For any given 


COMPARISON OF TWO MEANS 509 


from the tabulated values of the ¢-distribution the number of ‘effective’ d.o.f. 
which will thus adjust v to any preassigned level of significance. We try to 
show in Section 2.6 that in situations where some approximate knowledge of K 
is available, the statistic v seems to have a decided advantage over any other 
statistic having the ¢-distribution. We show by actual computations that Welch’s 
formula [2] provides a conservative estimate for the effective d.o.f. in the light of 
which this comparison will be considered. 

The Behrens-Fisher fiducial test employing the statistic d [5], [6], which has 
essentially the same structural form as v, has given rise to much controversy 
essentially because of inconsistencies arising from tests of significance based 
on the fiducial distribution of unknown parameters. We attempt to show in 
Section 2.7 that the fiducial test in general is ‘conservative’ in detecting significant 
results in repeated sampling from populations with a fixed value of the unknown 
ratio of variances. 

In the case of comparison of two regression coefficients when the residual 
variances are unequal, We are faced with a similar type of om Consider 
two samples 4 Yu | %, and y, | t,(e=lees,mti1jv= 1, --- ,% + 1), where 
a, and x, are fixed and y, and y, are vine sand independently distributed 
according to N(a, + 6;(%, — #), 01) and N(a2 + B(x, — 2’), 03) respectively. 
For the hypothesis 8; = 82 when the alternatives do not specify anything except 
B, > B. or <2; or B; ¥ Be we shall consider the merits of statistics ¢* and v* 
which correspond to statistics 4; and v for the two means. While the statistic ¢* 
is sensitive to the variation of both K = oj/c2 and w, the ratio of the sums of 
squares of the fixed variates, the statistic v* is insensitive to the variation of 
both. Barankin® has extended Scheffé’s test to the comparison of two regression 
coefficients under the above assumptions. The statistic proposed by Barankin 
has Student’s distribution with n; — 1 d.of. (ny < m2) and provides the only 
exact unbiased test so far known. While Scheffé’s test for the comparison of 
two means and Barankin’s test for the comparison of two regression coefficients 
should not be used when K is known and were never intended to utilize any 
available approximate information about K, the question of investigating into 
the possibility of using v* in the latter situation is not without interest (Section 3). 
In Section 4 we consider the hypothesis of equality of two linear regression 
functions viz., Ho: a1 = a2, 8; = B2 when the alternatives do not specify anything 
except Q, ~ a or By F Bo ‘ 

In studying the behavior of the power function and the probability of Type I 
error of certain statistics under discussion we have made full use of Hsu’s method 
and consequently only essential details have been given here. 


2. Hypothesis of equality of two means when variances are unequal 


2.1. The distribution of t, for any values of n; and nz . Consider the test function 
t.(=tx for K = k; Section 1) where k is some inexact value of K. This can be 





3. W. Barankin, “Extension of the Romanovsky-Bartlett-Scheffe test”? Proc. Berkeley 
Symposium on Math. Stat. and Prob., University of California Press, 1949, pp. 433-449. 








510 UTTAM CHAND 


put in the form of & = (& + 4) (bxi + ex3)? where ~ is N(0, 1) and the x's 
have independent x’-distribution with n, and mn d.o.f., and where 





2 2 \-4 
ae = 01 02 
om m) (1 + — 2.) , 
b = (K/k) (ny + me) [k(me + 1) + m + 1) [Ke + 1) +m4 1", 
c= (m+ m) [k(n + 1) +m 4+ 1 [Kim +1 4+m4+ 1)", 


b/c = K/k. 


In what follows we shall omit the subscript k from t. The joint probability 
element of £, xi and x; is given by 


dF (Ex, xi, x2) = (Qa) [P (rn/2)P (ne/2) 1 (5/2) 279 
(x2/2)"*? déd(xi) d(x2). 
We transform to new variables t, r and 6 by the relations 


E+ 8 = t(bxi + x3)’, 
bxi = r’ cos’ 6 (0< 6< 7/2), 
cx: =r’ sin’ 6 (-»x <ri+o), 


and integrate out r. To integrate out @ we put z = sin’ 6 if b < c and z = cos’ 6 
if b > c. This reduces the integration w.r.t. @ to a series of hypergeometric 
integrals. We finally have the following form for the frequency function of «; : 


evr (b/e)"*"? c " (st)" (2by""P r(™ > No . ¥ aa ‘) 


pena 
(2.1.1) r(5)r("+ *) r=0 hi + of) +m srs 1 


2 


2 





P (mtetet) ma ma + m2 I =) 
2 a 2 ’1+ bf? 
where F denotes the hypergeometric function. As a check if we put b = c = 
(ny + ne), we get the frequency function of non-central t for n; + n2 d.of. For 
the case b > c we have only to interchange b with c and n, with ne. 
The null distribution of t,(6 = 0) is an even function of ¢ ; consequently the 
forms of the single and two-equal-tailed probability of Type I error will be the 


same except for the constant 3. If we let 6:(6, K, k, m , m2) = / q(t) dt denote the 
to 


single upper tail power function of ¢;, , from (2.1.1) we obtain 


Bi(6, K, k, mi, ms) = 36° "(K/k)"*" DU DL 


h=0 r= 


(2.1.2) (52/2)? p a 4 h)(1 _K) 


) 
a k le, (* + Ne +h, r+ r} 
at) a 2 
3° 


Av \e 


\w 


COMPARISON OF TWO MEANS 511 


where 2» = (1 + bt’) and J,,(p, q) is the incomplete beta ratio. To obtain the 
two equal tailed power function 6.(6, K, k, m1 , n2) we need only change r into 2r 
and omit the factor 3. 

2.2. Distribution of t. for even values of n; and nz . (For notation refer to Section 
2.1). When n; and nz are even, the method of characteristic functions yields a 
single infinite series for the distribution of ¢, , and when 6 = 0 this series reduces 
to mam terms. The characteristic function of X = bxi + cx3 is given by 
o(r) = (1 — 2bir)~""” (1 — 2cir)~”*””. To obtain the form of the frequency func- 
tion of X we make use of the inversion theorem and integrate round a standard 
contour in the lower half of the complex plane. The distribution of ¢, can then be 
obtained from the joint probability element of § and X. We obtain the following 
form for the single tailed power function of t; : 


ease wen (8°/9)" > nol2 (ni/2)—1 
8,(8, K, km, 12) = go? 52 @ 2) (=) > 
r=0 , ' K-—k h—=0 


bol 





h=0 mi\ 2, 
r(®) a 


K \._,(n r+i1 i 
(4) I se (* _ h, 9 ) (K > k) 


where 2 has been defined in the previous section and 2) = (1 + cls)’. 
2.3. Unbiasedness of a test based on t,. Since the single and two tailed forms 
of the power function of ¢, (Section 2.1) are essentially the same functions of the 


9 k /2 ~~ r (3 + i) 
+ (—*( vu ” _ 


standardised ‘distance’ 6, following Hsu [3] we can show that * > 0 and = >0 


for any fixed K and k; and consequently such a generic type of statistic provides 
an unbiased test both against symmetric and asymmetric alternatives. 

2.4. Variations in the power function and the probability of Type I error of t . 
For the case k = 1, Hsu [3] has already shown that the probability of Type I 
error of the statistic {| is subject to large variations w.r.t. K. He also pointed 
out that the behavior of the derivative of its power function w.r.t. K for fixed 6 
was similar to that of its probability of Type I error w.r.t. K. We shall presently 
see that ¢, also shares this property with ¢j . 

In the first place one would like to know if any choice of k in a small neighbor- 
hood of K would stabilize the variations in the Type I error of ¢ to such an 
extent as to make it approximately insensitive to that difference between k and 








512 UTTAM CHAND 


K. With this end in view we shall examine the nature of variations in the proba- 
bility of Type I error of & w.r.t. K for any fixed k. 
From (2.1.2) by putting 6 = 0 we obtain 


P = Pe > &) = M/E (B+ hb) — K/iy 
=0 “ 


(2.4.1) as - 
Ne Ny Ne 1 
(r(G)ra +o) m(B 5" + 44). 


We now differentiate (2.4.1) and after simplification obtain 


=z < CAK/B tnd + 1) — alms + D/EIK Ge + 12) + a. + IK < BD. 





Similarly 
> C.[ne(n2 + 1) — mlm + 1)/A[K(me + 1) + m4 I" (K > k), 


where C; and C; are certain positive constants independent of K and k, 


If k = n(n + 1) 


we have 
No(Ne2 + 1) 


S| Q, 
Py 
VIA 
oO 


for K =k. 

This is the case when ¢; is identical with the statistic v defined in Section 1 
and the probability of Type I error curve expressing P as a function of K has a 
minimum at this point: for n) < mn. the minimum occurs for a value of K < 1 
and vice versa. And since v is known to be insensitive to the variation of K [8], 
therefore ¢;, is insensitive to the variation of K for this value of k. 

For any other assumed value of k the curve either starts decreasing from 
K = o orfrom K = 0 to the point where K = k depending upon the values of 
m, and nz. In each case the ordinate of the curve continues to decrease for some 
distance; it may decrease to a minimum and then start increasing or else decrease 
indefinitely. For fixed 6 the power function of % also has a minimum when 
Kapa we sy 

No(Ne + 1) 
similar to that of its probability of Type I error. For the case k = 1 numerical 
values of the single and two-tailed values of the probability of Type I error 
and power function for different values of n; and n, and K are given in Tables II 
and III (Section 2.5). 

In certain practical situations it may happen for example that on the basis 
of past experience one can determine k so that } < | k — K| < 2. The question 
arises: how much is ¢; sensitive to such a neighborhood for any k, K, mn and nz ? 
That it is hard to provide a practically useful answer to this question will be 


; and for any other k the behavior of its power function is 


“wT er 


COMPARISON OF TWO MEANS 513 


apparent from the nature of the distribution of &, which depends both on 
K and k and not merely on their ratio. The following Table I will indicate how 
in such a small neighborhood P(t, > t) can be in serious error in two different 
directions. 

2.5. Statistics t, and v in relation to asymmetric and symmetric aspects of 
“ Student’s’”’ hypothesis. Statistics t; and v are special cases of t and the behavior 
of their probability of Type I error and power function has already been discussed 
(Sections 2.3 and 2.4). In this section we compare the single-tailed and two 
tailed values of the probability of Type I error and power function in the light 
of several particular examples. In all these calculations e.g. in P(t > t) and 


TABLE I 
Variations in P(t, > to) with respect to k for fixed K 


(K = 5; m = 2; ne = 4; to = 2.447) 


k= 1 2 3 4 5 6 7 
1129 .0936 .0749 -0607 05 .0418 -0355 
TABLE II 


Variations in the symmetric and asymmetric probability of Type I error of v and t; in 
relation to the unknown ratio of variances K 














. % point of 
K | 0 125 S 1 2 4 8 16 © Ghulated & 
n= ne=3 | .074 .0633 .0504 .05 .0504 .0568 .0633 .0691 .074 single tailed 5% 
v=t - .092 .0681 .0525 = .05 .0525 .0597 .0681 .0770 .092 two-tailed 5% 
be .034 =. 0181 0110 ~—. 01 .0110 .0138 .0181 .0227 .034 two-tailed 1% 
ni = 4, ne = 16 .0112 .0129+ .0142 .0195 .0227 .0265 .0293 .0305 .0324 single tailed 1% 
rt - .012 0161+ =. 0197 -0238 .0204 .0359 .0407 .0433 .0465 two-tailed 1% 
nm = 8,n2= 4 .075 — . 0687 .0598  .0543 .0541 .0511¢ .0521 .0531 .056 single tailed 5% 
m = 4, ne = 16 .00011 .00043 .00310 .01 .0221 .0483 .0793 .0864 .133 single tailed 1% 
t ” .00007 .00031 .00244 .01 .0310 .0592 .1169 .1544 .222 two-tailed 1% 
n= 8,n2=4 .1342 .1056 = .0710 ~=—.05 .0368  .0287 .0246 .0224 .0204_ single tailed 5% 
+ P = .01l when K = .074 
~P = .05 when K = 3.6 


P(|t| > to), t refers to the single and é to the two tailed values of Fisher’s t 
for the appropriate number of d.of. Tables II and III give the approximate 
values for the probability of Type I error and the power function respectively 
both against symmetric and asymmetric alternatives. 

For equal sample sizes (v = ¢,) the Type I error and power function curves, 
representing probability of Type I error and power function as a function of K, 
have a minimum when K is unity and a maximum occurs when K is either zero or 
infinity. Maximum values of the probability of Type I error for several equal 
sample sizes are given in Table IV. It appears that for equal sample sizes the 
probability of Type I error and the power function are likely to be insensitive 
to the variation of K. We also notice in this connection that while the single 








514 UTTAM CHAND 


tailed values of the probability of Type I error are less than those of the two 
tailed values, the values of the two tailed power function for 6 = 1 are less 
than the corresponding single tailed values. This appears to be true also for the 
statistic v when n; ¥ nz. For unequal sample sizes also the probability of Type I 
error and the power function of ¢,; are likely to be more sensitive to the variation 
of K than those of v. It may be pointed out in the sequel that while it is recognized 
that for unequal d.o.f. a fair comparison of the probability of Type I error and 
the power function of v with those of ¢; ought to adjust v and ¢, to the same level 
of significance, namely the same maximum (for all K) probability of Type I 
error, this would not alter our conclusions about the sensitive nature of t, . 


TABLE III‘ 


Variations in the asymmetric and symmetric power function of t; and v corresponding to the 
5% point of tabulated t:(6 = 1) 


K= 0 & 1 2 2 





m= n= 3 189 141 137 141 189 symmetric 
v= th 269 .229 2255 .229 .269 asymmetric 
n, = 8, nm. = 4 354 202 .152 112 .063 symmetric 
ty .428 294 2425 .194 122 asymmetric 
m = 8,n. = 4 .208 .196 . 162 . 1567 .168 symmetric 
v . 286 .299 .247 .244f 255 asymmetric 

+ minimum of .152 is reached for K = 3.6. 

t minimum of .242 is reached for K = 3.6. 


TABLE IV 
Maximum probability of Type I error of v(= t:) for equal degrees of freedom 





Symmetric Asymmetric 
mt1i=n+1 5% 1% 5% 1% 
a .0721 .0224 .0625 .0182 
9 .0668 .0193 .0595 .0162 
11 .0635 O173 .0576 .0150 
15 .0598 .0152 .0555 .0136 


21 .0569 


-0137 .0538 .0125 


2.6. Statistic v, Scheffé’s test and paired difference t. If K is known, v or Scheffé’s 
statistic S should not be used. If K is unknown, S is an ingenious device for 
getting a Student’s ¢ with min(n;, n2) d.o.f. and provides the only exact un- 
biased test so far known. In such a situation since nothing is known about K, a 
fair comparison of the power function of S with v ought to adjust v to the same 
maximum probability of Type I error for all K (maximum will occur for K = 0 
or K = & according as n; 2 ne); and at such a maximum significance level it is 

4 The author acknowledges with pleasure the help given in the preparation of this table 
by Miss Elizabeth Shuhany of the Statistical Laboratory, Boston University. 

5 Values taken from [7]. 


COMPARISON OF TWO MEANS 515 


recognized that v cannot be uniformly better than S. For samples of equal 
size n the use of the paired difference ¢ with n — 1 d.o.f. (equivalent to S when 
n = Nn ; Section 1) provides a suitable test for two reasons: (i) it is exact and 
(ii) as shown by Walsh [8] has a high power efficiency. 

If any approximate a priori information about K is available, » appears to 
be the only suitable statistic to utilize such information. While S was not intended 
to cope with such a situation, t (Section 2.4) has been shown to be unsuitable. 
Since v is insensitive to the variation of K, we shall not be far wrong in using 
‘effective’ d.o.f. based upon an assumed value k of K satisfying some such relation 
as4 <|k— K| < 2. The effective d.o.f. of v as given by Welch [1] and as given 
by P = P(v > t) or by P = P(|v| => to) for fixed P (listed in Table V as caleu- 

are 3 os see ae ae - +: = m(n; + 1) 
lated d.o.f.) are identical for K = 0,1, an © (mn; = ne) and (ii) K = 0, a ae 
and ©(n,; # nm). For other values of K it appears from Table V that Welch’s 
formula errs on the conservative side. The effective number of d.o.f. vary between 
nm, + m2 and min(n, nz) (ef. d.of. for S). Consequently in the absence of any 


TABLE V 
Adjusted power function of v in the light of ‘effective’ degrees of freedom 


. Adjusted asymmetric power function of 9 ° 
Sample Size for probability of Type I error of .05 Effective d.o.f. 





Welch’s formula 


é6=1 | 6=2 Calculated 
K = 0.125 4 C) \K = 0.12 4 o K=0 .125 4 o/K =0 .125 4 « 








mtl=ne+tl=3 | .174 .204 .204 .174 .384 .476 .476 .384 | 2 3.36 3.36 2| 2 2.94 2.94 2 
mt+l=ne+tl=7 | .225 .236 .236 .225/| .550 .581 .581 .550 6 9.14 9.14 6] 6 8.82 8.82 6 
m+1=9;n2e+1=5) .210 .227 .242 .233 | .504 .556 .594 .572' 4 6.50 11.90 R| 4 5.14 11.90 8 





best unbiased test and in the light of any approximate information about K it 
would appear that v has a decided advantage over any other statistic. 
2.7. The Behrens-Fisher test in repeated sampling. Consider the statistic 


d = ( — 2) (si + si)? = t, sin @ — & cos 8, 


where sj and s3 are the unbiased estimates of the variances of the means Z and #’ 
respectively, ¢, and ¢. have independent ‘“‘Student’s” distributions with nm, and n. 
d.o.f. respectively, and tan 6 = s;/s. . On the basis of the “‘fiducial” distribution of 
o; and a3 Fisher [6] regards d as a “mixture” of t, and t. with constant coefficients. 
It is to be noted that if s; and so are fixed in the classical sense ¢; and f have 
independent normal conditional distributions with zero means and variances 
o;/s; and o/s; respectively; and if s; and s. vary in their own distribution d is 
identical with v (Section 1). 

Neyman [9] considered the integral of the joint probability law of 2, 2’, si , s2 

~ = 


Z—Z ; : 
over the set Ved si < t, sin 6 — t cos 6 where the quantity on the right also 


depends upon s; and s. and is the quantity d tabulated by Sukhatme [10], [11]. 








516 UTTAM CHAND 


Neyman showed in particular that if pairs of normal populations with different K 
are sampled (n; + 1 = 13, nm. + 1 = 7), then the relative frequency of correct 
statements about m, — m, based on the 5% points of d will not be equal to the 
expected .95 and will vary with K. 

We consider here the following similar type of question: what is the nature of 
discrepancies that will arise in the probability of Type I error by the repeated 
use of the Behrens-Fisher test in sampling from two normal populations? We 
observe that since d and v have the same structural form, the appropriate 
probability of Type I error in such a situation will be given by the probability 
integral of v (Sections 2.2 and 2.5). 











TABLE VI 
Minimum and maximumf values of P( | v | > do) for different values of K 
K 0 .05 1 2 © do 
mtl=n+i1=7 .05 .0321 .03807 ° .0321 .05 2.447 
.0508 -0329 .0313 -0329 -0508 2.435 
mtl=n+i1=9 .05 .0362 .0346 .0362 .05 2.306 
.0512 -0367 -0358 -0367 -0512 2.292 
mtl=nt+1l=13 .05 .0405 .0396 .0405 -05 2.179 
.0507 -0434 -0403 .0434 .0507 2.170 
mt1l=7,n+1= 13 .0307 .0281 .0317 .0393 .05 2.447 
.05 -0460 .0516 .0597 -0720 2.179 
m= Nn = 0 .05 .05 .05 .05 .05 1.960 


ft maximum values have been indicated in bold type. 


We observe that P(| v | > x) is a monotone decreasing function of x for any 


fixed K, n, and n.. Furthermore for fixed x, n; and nz we have a 2 0 for (i) 

_ ey pe > M(NM + 1) se _ 
K2=1,m = mand (ii) K = ams 1) m ~ n. Table VI gives the minimum 
and maximum values of P(|v| > do) for different values of K where dp corre- 
sponds to the highest and lowest value of tabulated d. It appears that for equal 
sample sizes the minimum probability of Type I error is less than .05 and will 
converge to .05 when K is either infinity or zero. The maximum probability of 
Type I error converges to a value slightly higher than .05. This probability also 
converges to .05 with increasing size of equal samples for every K. For unequal 
sample sizes e.g. 21 < m2, the minimum values converge to .05 when K = ~ and 
if nm. > ne, this convergence takes place when K = 0. The maximum values 
are both greater and less than .05. 





3. Hypothesis of equality of regression coefficients when residual variances 
are unequal. 


3.1. Unbiasedness of tests based on statistics t* and v*. Consider 


Sy —- YP + Sy -Y)/1 :i7" 
rao w SOE MAO PMCS 4 BY 
1 27 « 4h 1 aL 2 





SS OE Oo ll 


we 


COMPARISON OF TWO MEANS 517 


and 








*- @ —2)| So-Y* ir 
oe oe (bs ba) 2 = 1) M2(ne2 _ 1) 


where b; and be are regression coefficients calculated from independent samples; Y 
and Y’ are the sample regression functions; M, = S(z — #)’ and M2 = S’(a’—2’)’. 
Under the assumptions of Section 1 these two quantities are distributed as 
i = (E + A) (uixivma + Hex3.ne—t) 5 
v* = (é + A) Ouxi.e:-1 + heal ane), 


respectively, where ~ is N(O, 1) and the x”s have independent x’-distribution 
with d.o.f. indicated in the second subscripts, and where 


M;/M:2 = W, 
wy = K(w + 1) (K + w) (m + m — 2)7, 
ue = (w+ 1) (K + w) (um + m — 2)", 


m= K, 
Me 

oi ax" 
A= (1 - 6) ($1 + #) ’ 
M = K(K + wv) (m - 1", 
ho = w(K + w) (ne — 1)", 
Ai Pe ed 1 
rT 


Consequently these two statistics have the same basic distribution as obtained 
previously for & (Section 2.1) and their power functions are monotone increasing 
functions of the standardized ‘distance’ A for fixed values of K, w, n, and ne. 
While the statistic ¢* has ‘“‘Student’s” distribution with nm, + nz. — 2 dof. 
whenever K = 1, the statistic v* is only so distributed when K = w(n, — 1) 
(ne a - 

3.2. Variations in the probability of Type I error and power function of t* and v*. 
The behavior of the partial derivatives of the probability of Type I error and 
the power function of ¢* and v* w.r.t. K and also in relation to w is essentially 
the same. For purposes of illustration we shall only consider the behavior of the 
probability of Type I error. We shall presently see that for the hypothesis 
8; = Be (ef. “Student’s” hypothesis m, = m2) while ¢* is sensitive to the variation 
of K and w, v* is insensitive to both. 

3.2.1. Variations w.r.t. K for fixed w. Remembering that the x”s in the de- 
nominator of ¢* have respectively n, — 1 and n2 — 1 d.o.f., we can write down 
P(t* > t) from the corresponding form for ¢, (Section 2.3). After simplification 
we obtain 


(3.2.1.1) oe < lie - ) -ee- c+ we (K <1), 








518 UTTAM CHAND 


where 2 = (1 + pito). If we make use of the relation P(m , ne, Mi, Mz, K) = 
P(n2, 1, Mz, M,, K~’) in (3.2.1.1) we obtain 


(3.2.1.2) a i+ ea =< 1) ~~ (K > 1), 


where L, and Ly» are certain positive constants independent of M,, M. and K. 
Similarly for the statistic v* we have 


oP 


(3.2.1.3) aK < Di(K¢)"[(m2 — 1) — w(m — 1)¢]/(K + w) (Ko < 1) 
and 
(3.2.1.4) am > D[(nz — 1) — w(m — 1)¢]/(K + w) (K¢ > 1), 


where D, and D, are certain positive constants independent of K, M; and M, and 


ne — I Ne — 1 
NE 7, ° h: : . one : — os i 2 
aim > bi. We notice that if (i) m. = nz and w = 1 or (ii) w ——e 


we have ¢* = v* and both from (8.2.1.1), (3.2.1.2) and from (8.2.1.3), (3.2.1.4) 


where ¢ = 





2 
we obtain - = 0 for K = 1. In the case (i) the maximum probability of Type I 


error occurs at K = © and K = 0. In case (ii) the maximum will sometimes 
occur for K = 0 and sometimes for K = , depending on the relative magnitude 
of nm, and ne. 

For other situations ‘* and v* exhibit a type of behavior essentially similar 
to that of ¢,; and v (Section 2.5). We notice that the (P, K) curve for v* has a 
a eed If nm, = ne, the minimum point is given by 
Kk = w. Therefore with an approximate knowledge of K, a useful practical hint 
to remember is to so adjust M, and M; as to have w approximately equal to K. 
If n; ¥ ne any information about oj being greater or less than oj can be used 
with decided advantage to adjust M, , M2 , n, and m so as to reduce considerably 
the risk of the first kind and thus work in a region of the (P, K) curve where 
there is not much danger of bias in the probability of Type I error. This will 
also reduce the fluctuations of the power function of v about its minimum which 
w(n — 1) 

a | c 

3.2.2. Variations in relation to w for fixed K. The partial derivative of P(t* > t) 

with respect to w is given by 


minimum when K = 





also occurs for K = 





OF = 40 — KK" (K + wy? Da — KY 
ow h=0 
(3.2.2.1) r(™ - 5 4 h) rae fi.» ) 
9 «<0 


celta — T\ osm + me — 2 
ane) a(R? 4 0,2) 


(K <1). 


od 


re 
ill 
ch 


to) 


1). 


COMPARISON OF TWO MEANS 519 


Therefore 
oP 
aw >0 
for K <1. 
Similarly 
oP 
dace 
Ow 
for K > 1. 


To justify the differentiation of the series in (3.2.2.1) we make use of the result 


_ — 
if ee eee +h, :) wi 1,(mtm—? + & +1, 1) 


— — 


+n2—2)/2+h 
zo"! n2—2)/ (1 _ z0)' 


(CREP i)a(MER=F a 


and consequently the series under consideration may be shown to be dominated 
by an absolutely and uniformly convergent series for0 < K < 1. 
For the statistic v* consider 


Poot > t) = HKG)" D1 — Koy (BE + 8) 
h=0 





2 


—1 
rw + pr(™ > ‘)| i (* + =—* +h, 1) (K $ <1) 


—_ 


(3.2.2.2) 








where yo = (1 + dito)’. We notice from (3.2.2.2) and from the form of quantities 
\; and A» (Section 3.1) that P(v* > t) depends on K and w only through the 
product of K and 1/w. Consequently variations of P w.r.t. 1/w for fixed K 
are the same as those of P w.r.t. K for fixed w. Thus we may directly infer that 
P(v* > t) will be insensitive to the variations of w. The following Table VII 
will illustrate the nature of variations in the probability of Type I error in the 
tests based on ¢* and o* in relation to w. 


TABLE VII 
Variations in the probability of Type I error of t* and v* 
(K = 2; m = ne = 7; to = 1.782) 


| 








w 0 25 & 1 2 % 
Pit > te) .0259 .0358 .0427 .0512 .0594 .0866 


P(v* > to) 0625 .0570 0539 0512 -05 0625 


It would appear that on the analogy of statistics ¢, and v for the comparison of 
two means one could guess about the sensitive nature of ¢* in relation to the 








520 UTTAM CHAND 


variations of the ‘nuisance’ parameter K. The additional drawback in ¢* which 
stems from the monotone nature of its variations with respect to w is a further 
warning against the use of a ¢* type statistic for the hypothesis 8; = 6. when 
a; ¥ op. 

4. Hypothesis of equality of tvo linear regression functions when variances 
are unequal. 


4.1. The statistic Z. (For notation refer to Sections 2.1 and 3.1). Consider the 
model given in Sections 1 and 3 for the comparison of two regression coefficients. 
If the variances are equal, the statistic based on the likelihood ratio criterion 
for the composite hypothesis a, = a2 and 6; = #2 is given by 
2 an (j = G2)’ (ms + 1)(m. + 1)(m+ ne + 2)" + (b, — be)’"M.MAM, + M:)* 

my — FF + oy — FP 

The quantity Z is distributed like the ratio of two independently distributed x°’s 
and consequently its distribution is precisely determined under the hypothesis. 
If oi ¥ o2, Z can be put in the form of 

Z= (aixi,1 + @ox3,1) (Kx3,n,-1 + Cass". 
which is now distributed as the ratio of ‘mixtures’ of independently distributed 
x’’s with d.o.f. indicated in the second subscripts and where 

a, = [my +14 K(ne + 1] (rm + m 4+ 2)", 

a, = (K+ w) (1+ w)™. 


In the non-null case when a; # a2, 8; ¥ B2 the numerator of Z is a mixture of 
non-central squares. If we let B(K, w, 6, A, m1, m2) denote the power function 
of Z, following Robbins and Pittman [12] we obtain 


B(K, w, 6, A, m1, n2) = Dy 2, DC; dipel (at > +h~ lk+jt+ 1) 





j=0 h=0 k=0 


- n+l 
I 1, w 
(x > ,o< MEN), 


(4.1.1) 





where 


"i bia 
9)’ I 5 , 
{= nel ED (1 — a;/az)’, 


xorg or J + i\(1 a 4) 
2 K 


‘i= ——_—_—_— ' 
r(™ = ') h! 

m= ee? (1D)*/k! (DP =8 
¢ = (i+ Z/a)-. 








COMPARISON OF TWO MEANS 521 


4.2. Variations in the probability of Type I error and the power function of Z. 
Corresponding to (4.1.1) we obtain the expression for the probability of Type I 
error P(Z > Zo) by putting D = 0 and k = 0. It has not been possible to establish 
any definite law concerning the behavior of the probability of Type I error 
and the power function w.r.t. the ‘nuisance’ parameter K. However we shall 
presently establish their monotone dependence on the variable parameter w. 

We differentiate P(Z > Z) with respect to w and after simplification obtain 


ins aT (j + 3) a\? ay. a,\*" 
5p = K — Wla/aytas (2 -2) - $5(1 - =) | 


7 (mt nm as . =~ 2 (a,/a2)* rj + 3/2) 
(mam ss i +1) (K+ ul +0) *- jira) 


[1(mt eho 274 1)-1("4™ 441,542) ] <0 


m+ 1 


i’ Similarly by utilizing an appropriate expression for 








for K > l,w< 





Ne 
P(Z > Zo) for K > 1, w >= x a we can show that = < 0. For the case 
No + i: Ow 
K < 1 it can be shown that P(Z > Zp) is a monotone increasing function of w. 
This is also true of the dependence of the power function of Z on w. 
4.3. Unbiasedness of Z. We differentiate (4.1.1) w.r.t. 6 and A and after 


simplification obtain = 0, > 0. Thus the power function of Z has a relative 





minimum at 6 = 0, A = 0. 
The author is greatly indebted to Professors Harold Hotelling and William G. 


Madow for guidance in this research and to the referees for many useful sug- 
gestions and criticisms. 


REFERENCES 


[1] B. L. Weucn, ‘‘The significance of the difference between two means when the popula- 
tion variances are unequal’’, Biometrika, Vol. 29 (1938), pp. 350-361. 

[2] M. G. KenpDALL, The Advanced Theory of Statistics, Vol. 2, C. Griffin and Co., 1946, 
pp. 112-115. 

[3] P. L. Hsu, ‘‘Contribution to the theory of ‘‘Student’s”’ t-test as applied to the problem 
of two samples’’, Stat. Res. Memoirs, Vol. 2 (1938), pp. 1-24. 

[4] H. Scuerrs&, ‘‘On the solutions of the Behrens-Fisher problem based on the t-distribu- 
tion’, Annals of Math. Stat., Vol. 14 (1943), pp. 35-44. 

[5] W. V. Benrens, ‘Ein Beitrag zur fehlerberechnung bei wenigen Beobachtungen”’, 
Landw. Jb., Vol. 68 (1929), pp. 807-837. 

[6] R. A. Fisuer, ‘The fiducial argument in statistical inference’’, Annals of Eugenics, 
Vol. 6 (1935), pp. 391-398. 

[7] J. NeyMAN, “Statistical problems in agricultural experiments’’, Jour. Roy. Stat. 
Soc., Supp., Vol. 2 (1935), pp. 107-180. 








522 UTTAM CHAND 


[8] J. E&. Wausu, ‘On the power efficiency of a t-test formed by pairing sample values” 

Annals of Math. Stat., Vol. 38 (1947), pp. 601-604. 
[9] J. Neyman, “‘Fiducial argument and the theory of confidence intervals’’, Biometrika, 

Vol. 32 (1941), pp. 128, ff. 

{10} P. V. Suxkuatme, ‘“‘On Fisher and Behrens’ test of significance for the difference in 
means of two normal samples’’, Sankhyd, Vol. 4 (1938), pp. 39-48. 

[11] R. A. Fisuer ann F. Yates, Statistical Tables, Oliver and Boyd, 1943. 

[12] H. Ropsins ann E. J. G. Prrman, ‘‘Application of the method of mixtures to quadratic 
forms in normal variates’’, Annals of Math. Stat., Vol. 20 (1949), pp. 552-560. 


THE EXTREMAL QUOTIENT 


By E. J. Gumpext ano R. D. KEENEY 


New York City and Metropolitan Life Insurance Company 


Summary. The extremal quotient is defined as the ratio of the largest to the 
absolute value of the smallest observation. Its analytical properties for sym- 
metrical, continuous and unlimited distributions are obtained from a study of 
the auto-quotient defined as the ratio of two non-negative variates with identi- 
cal distributions. The relation of the two statistics is established by proving 
that, for sufficiently large samples from an initial distribution with median zero, 
the largest (or smallest) value may be assumed to be positive (or negative) 
and that the extremes are independent. It follows that the distribution and the 
probability of the extremal quotient possess certain symmetries, and that its 
median is unity. As many moments exist for the extremal quotient as moments 
and reciprocal moments exist simultaneously for the initial variate. The loga- 
rithm of the extremal quotient is symmetrically distributed. These properties 
hold for all continuous symmetrical unlimited variates which possess a mono- 
tonically increasing probability function. 

For the exponential type, the asymptotic distribution of the extremal quo- 
tient can only be expressed by an integral. In this case, no moments exist. For 
the Cauchy type, the asymptotic distribution is very simple, and the logarithm 
of the extremal quotient has the same distribution as the midrange for initial 
distributions of the exponential type. 

It is not necessary to consider asymmetrical distributions since, in this case, 
for sufficiently large samples, one of the extremes will outweigh the other, 
unless the distribution is nearly symmetrical or has rapidly varying tails. 


1. The auto-quotient and the extremal quotient. Let + and y be two inde- 
pendent non-negative continuous variates, unlimited to the right. Let fi(z) and 
fo(y) be the distributions (probability densities), and let F:(z) and F.2(y) be 
the probability functions. Then the joint distribution of the two variates is 
their product. The quotient 


(1.1) Q = 2/y 
is also non-negative and unlimited to the right. Since 


d 
t= yQ; 07% 


the joint distribution w(y, Q) of the quotient Q and the variate y is 


(1.2) wy, Q) = filyQ)foly)-y, 
523 














(1.3) nQ) = [ whlyOrfely) dy 


The quotient Q possesses a mode if (and only if) fi(~) possesses a mode. 
Assume now that the two variates x and y have the same distribution 


(1.4) fiz) = f@); fay) = FY) 


with the same parameter values. The quotient of two variates with identical 
distributions is henceforth called the auto-quotient q. . It may be realized if there 
are two independent series of observations taken from the same population and 
ordered in time. Each value from the first series is divided by the corresponding 
value from the second series. Another realization consists in dividing each value 
obtained in one series of independent observations by every other value. A 
third realization is obtained by considering two asymmetrical distributions 
fi(a) and fo(y) where x 2 0, y S 0, and 


(1.4’) fety) = fi(—2). 


The two distributions are called mutually symmetrical, and the auto-quotient 


1S 


524 E. J. GUMBEL AND R. D. KEENEY 
and the marginal distribution h(Q) of the variate Q alone becomes 


Qa = x/(—y). 


From the definition of the auto-quotient it follows that the distribution of qa 
must be the same as the distribution of its reciprocal r = 1/qa . The proof of this 
statement is simple. Under the condition (1.4), the distribution h(qa) becomes, 
from (1.3) 


(1.5) h(qa) = l yf (yqa)f(y) dy. . 


The distribution h,(r) of the reciprocal is 
1 eo 
hi(r) = 2 I yf(y/r)f(y) dy. 


If y/r is replaced by 2, the distribution of r is 
(1.6) hi(r) = h(qa). 


Thus, the distribution of the auto-quotient of a non-negative unlimited variate 
is invariant under a reciprocal transformation. 

The shape of the distribution h(q2) and the location of the mode may be ob- 
tained from the density of probability h(1/qa) at the value 1/q_ (which differs, 
of course, from the distribution f(r) of r = 1/qa). From (1.5) we obtain 


h(1/qe) = I uf(y/qa)f(y) dy. 


wae F Be US ee Ce 


THE EXTREMAL QUOTIENT 525 


The transformation 


y/Wa = 2; dy = qa dz, 
leads to 


(1.7) h(1/qa) = qah(qa). 


This is a symmetry relation for the distribution of the auto-quotient of a non- 
negative unlimited variate. If q. is larger than unity, 


(1.8) h(1/qa) > h(qa). 


If the distribution h(q.) is continuous for all values of g., the derivative of 
equation (1.7) with respect to qa leads, for ga = 1, to 


(1.9) h’'(1) = —hA(1). 


If the distribution h(q.) possesses a unique mode, it must be less than unity. 
The moments gi are, from (1.5) 


ange IanO y=oo 
qa = [ Sa [. a’ yf (ay) fly) dy dq 


- [7 fa 


-0 y* 


(qay)"f(qay) d(qay) dy. 


The inner integral is the moment y* of order k of the initial variate y, and the 


_ 


remaining integral is its reciprocal moment y “ of order —k. Thus 
(1.10) Ga = yy = qa’. 
The moments of order k and of order —k of qq exist if the moments and the 
reciprocal moments of order k for the initial variate exist simultaneously. The 
second equation in (1.10) also follows immediately from the invariance of qa 
under a reciprocal transformation. Even if the initial distribution possesses all 
moments, the mean @, need not exist, and the same holds, of course, for the mean 
error and the higher moments. The procedure, usual in economic and meteorolog- 
ical statistics, of calculating the quotients of two series of independent posi- 
tive variables in order to test whether this ratio is constant may be misleading, 
especially if the two series happen to be samples taken from the same population. 
The theoretical mean need not exist, and the calculated mean of the observed 
quotients need not characterize the relation between the two series. 

The probability function H(Q) of the quotient Q obtained from (1.3) is 


Q pe 
H(Q) = I | yfilzy)foly) dy dz. 





Change of the order of integration leads to 


H@ = | ” faly)Fx(Qy) dy. 








526 E. J. GUMBEL AND R. D. KEENEY 


The probability function H(q.) of the auto-quotient obtained from (1.4) is 


1 
(4.11) H (qa) = I F(qay) aF. 
0 
Integration by parts leads to 
(1.12) H(qa) = 1 — a | F(y)f(qay) dy. 
0 
The boundary condition, H(0) = 0; H(*#) = 1 ean immediately be verified if 


the preceding equation is written in the form 
(1.13) H(qa) = 1 — | F (z/qa) f(z) dz. 
0 


The probability H(g.) possesses a symmetry relation which is analogous to 
(1.7). The probability at the value 1/q, is, from (1.11) 


H(/qe) = | Fy/adf(y) ay, 
0 
If we introduce the variable of integration 


Y = Mz, 


we obtain from (1.12) 


(1.14) H(qa) = 1 — H(1/qa). 


If gq is any quantile, such that H(q,) = P, its reciprocal 1; qa has the probability 
1 — P. The first quartile (decile) is the reciprocal of the third quartile, (ninth 
decile) and so on. 

lor ga = 1, equation (1.14) leads to 


(1.14’) H(1) = 3. 


The median of the auto-quotient of a positive unlimited variaie is unity. From 
(1.9) it follows that the median surpasses the mode, if a unique mode exists. 

Finally, equation (1.14) may be used to construct a symmetrical distribution. 
If a new variate 


(1.15) z= Ig qa 


with the probability function H*(z) is introduced, the symmetry relation (1.14) 
becomes 


(1.16) H*(z) = 1 — H*(—2). 


The logarithm of the auto-quotient of a positive unlimited variate has a sym- 
metrical distribution about median zero. The geometric mean of ga exists and is 
equal to unity. 


THE EXTREMAL QUOTIENT 527 


These results hold if each observed value of a non-negative unlimited variate 
is divided by each other observed value. They do not hold for the quotients of 
two specific order statistics because, in general, the fundamental assumption of 
independence does no longer hold. However, some consequences for the quotients 
of extreme mth values may be deduced. 

Consider a symmetrical unlimited variate. Then the distribution ,.o(,,2) 
of the mth smallest value ,,.v, and the distribution ¢,,(z,,) of the mth largest value 
tm are mutually symmetrical in the sense of (1.4’). Therefore the extremal 
quotient 





(1.17) dn = = 
—max 


may be interpreted as an auto-quotient provided that 1) the probability for 
tm to be negative, and x to be positive, may be neglected; 2) the distributions 
of the mth smallest and the mth largest values are independent. Under these 
conditions the distribution, the moments, and the prebability function of the 
extremal quotient are obtained from (1.5), (1.10), and (1.11) respectively, if 
the initial distribution f(y) is replaced by the distribution of the mth largest 
values Gm(Xm). The symmetry relations (1.7) and (1.14) and their consequence, 
that the median is equal to unity, hold in particular for m = 1, i.e. for the ex- 
tremal quotient proper. 

The validity of the two conditions has now to be established. 

a) Consider a symmetrical distribution f(x) with median zero. Then the 
probability that the largest among n observations, x, , is equal to or less than a 
certain x, is 1 — F”(x). The probability P that the largest among n values is 
positive, i.e. larger than the median, is 


(1.18) P=1-2". 


If n is sufficiently large, this probability differs from unity by an amount that 
can be made as small as we please. Even for relatively small samples, say n = 20, 
the probability that the largest value will be positive is of the order 1 — 10°°. 
Thus, we expect only one largest value in a million samples of size 20 to be nega- 
tive. The same argument shows that the smallest value 2; may be expected to 
be negative. Thus the postulate 


(1.19) Ze & 0; am & 0, 


is a very weak restriction upon the sample size. If m is sufficiently small, the 
same result holds for the mth extremes. 

b) It is known [7] that the joint distribution w,(x , x,) of the extremes taken 
from an initial distribution of the exponential type converges, for sufficiently 
large samples, toward the product of the asymptotic distribution g(z,) of the 
largest value, and ,¢(x;) of the smallest value. A similar theorem will now be 
proven for a general class of continuous distributions. 








528 E. J. GUMBEL AND R. D. KEENEY 


Let »x be the mth smallest observation; let 2; be the /th largest observation 
where m and / are small compared to n, n being large. Then the joint distribution 


al aX, 2X1) is 


n! 
(1.20) Wnlmt, a) = (m — 1)\(l — 1)!(n — m — ID)! 
F(a)" (F(x) — Fmt)" "(1 = F(x)" f(x) f(r). 


Now the transformation 
(1.21) nl — F@))=£§& nF(mr) =n; OSESn, OES 


due to Cramér ({1], p. 371) is used. Then the joint distribution v,(£, 7) of the 
new variates £ and 7 becomes 


- 7” ¢ m—l _ é —s n—m—l a)" 
vn(é, n) = wm — Nd—-1l)in —m— od! @ (1 n ) (? 


where m + 1 is small compared to n. As n increases, v,(&, 7) converges to 


7 gm ert ao 
v(é, ») = (¢—,,) (7-5): 


so that in the limit £ and 7 are independent. If now the mild restriction is im- 
posed that F(a) be monotonically increasing, (1.21) defines a one to one transfor- 
mation, and therefore there must exist an inverse function uniquely defining 
mx as a function of £, and 2; as a function of y. From the limiting independence 
of £ and 7 the limiting independence of the extremes »x and 2, follows at once. 

Thus the second condition is fulfilled, and the mth extremal quotient shares 
all properties of the auto-quotient. This holds also for initial symmetrical dis- 
tributions which do not possess asymptotic distributions of the extremes. 

In the following, the two types of initial distributions of an unlimited variate 
are considered for which asymptotic distributions of the extremes exist, namely, 
the exponential and the Cauchy type. For simplicity, only the extremal quotient 
proper, designated by gq, is studied. The two asymptotic probabilities of the 
extremal quotients for these symmetrical distributions are obtained by introduc- 
ing the asymptotic distributions of the largest value into the probability func- 
tion (1.11) of the auto-quotient. 


’ 





2. Application to the exponential type. For symmetrical distributions of the 
exponential type the asymptotic distribution of the largest value is 


Hiatt, 


(2.1) g(x) = a exp [—a(a — u) — e 


’ 


where wu and a are defined in terms of the initial probability /(«) and the initial 
distribution f(a) by 


(2.2) F(u) =1—1/n; a = nf(u), 


n being the sample size. The distribution (2.1) will now be simplified by intro- 
ducing a new parameter \ defined by 


(2.3) eo =r > 0. 


THE EXTREMAL QUOTIENT 529 


To see the meaning of A, consider Laplace’s first distribution, then the so 
called logistic [6], and the normal distributions, all of which are of the exponential 
type. In the first two cases we obtain, from (2.2), after some calculations, 


(2.4) a=1, u=lgn — lg2; a=1-—1/n, u = lg (n — 1), 


whereas for the normal distribution, we have asymptotically 


a=u= V2lg (n/+~/2r) 


and 
(2.4’) \ = n*/(2n). 


For these distributions, and interpreted in this sense, \ is of the order of the 
sample size or its square. 


From (2.3) and (2.1) the distribution g(x) and the probability function (zx) 
are 


(2.5) g(x) = ad exp [—axr — de ™]; &(z) = exp [—de ™]. 


In order to fulfill the condition (1.19), namely 6(0) = 0, the distribution g(x) 
must be truncated at x = 0. This leads to the truncated distribution g(x) and 
the truncated probability ®,(2) where 


\ exp [—axr — Ae ™”] exp [<hke “] ~ 6 


1 — e> : 1 — e 


(2.6) gu(x) = 


The asymptotic probability function H)(q) for the extremal quotient of a sym- 
metrical variate of the exponential type is now obtained from (1.11), if y, f(y), 
and F(y), are replaced by 2, ¢,(x) and ©&,(x%), respectively, and the index a is 
dropped. Consequently, from (2.6), 


H\(q) = i | ax exp [—ax — he — de] dx 


( 
c 2 
ja aay | ad exp [—axr — re ™] dz. 

_ 0 

The transformation 

e™ = 2; ae “dx = —dz 
leads to 
(2.7) I) = Gap [er ae = 
. mY Tey ‘ 2 7 7a" 


This probability of the extremal quotient for initial symmetrical distributions 
of the exponential type is not truely asymptotic since the parameter \ depends 
upon n. (See Addendum). 

Unfortunately, the expression (2.7) cannot be integrated. Therefore the prob- 
ability function has to be studied in an analytic way. For this purpose we first 
recall the general properties 

H(0)=0; A(Ql)=3; H(#) = 1, 


valid for any value of \. Furthermore, for any \, we have the symmetry rela- 
tion (1.14). These properties can be verified at once from (2.7). 








530 E. J. GUMBEL AND R. D. KEENEY 


The numerical values of H,(q) can easily be calculated for g = 4 and q = 2. 
Consider a value of \, say of the order 6. Then formula (2.7) may be written 


1 
H)(2) = | he rete?) / 
| 0 
(2.8) | 
= V/X enl4 [ oo Meth? v ie 
0 


If we introduce 
t as dt 
Vr (2 + 3) /2 VX V3 


the probability H,(2) becomes a difference of two normal probability integrals, 
ner ~ vare*fi~F 8) ~ (1 FOB) 
where F stands for the normal probability function. 


The second expression may be neglected compared to the first one for A 2 4, 
whence 


2¢ 9) = a Als [ e Pl 
(2.9) H(2) V5 re dt. 
The symmetry relation (1.14) leads to the knowledge of H,(3). Thus the three 
probabilities H,(3), H(1), and H)(2) are known. 

To see the influence of \ on H,(2), we use a method due to R. D. Gordon [4]. 
This author considers a function R, defined by 





(2.10) R, « en | ec"? aq 2 > 0, 
and proves that 
Comin Lek Bae +e re 
dx dx? dix 
It follows that 
d 
— (x 0. 
- (xR) > 
If we substitute+/\/2 for x, this inequality may be written, from (2.9) and (2.10), 
Se fi ae f” tt ) — dH)(2) 
= ( [= eM OP dt) = 2/2 > 0. 
(V3 Vi72 . . dy 


d V3 
Consequently H)(2) increases with \ whereas, from (1.14), the probability 
H)(4) decreases with \. The following table gives the probabilities H,(2) and 
H,(3), (2.9) and their differences 


(2.11) P,(2) = Hy(2) — Hy(3). 


Ee 


), 


THE EXTREMAL QUOTIENT 531 


Asymptotic probabilities of the extremal quotient for symmetrical distributions of 
the exponential type 








Parameter Probabilities (2.9), (1.14) Probability (2.11) 
r | H,(2) | Hy) P,(2) 
8 | 84376 | 15624 68752 
18 | 91377 | .08623 | 82754 
32 | . 94661 | .05339 . 89322 
50 | . 96438 | 03562 . 92876 
72 | 97427 .02573 | 94854 


98 . 98087 | .01913 | .96174 


‘The approximative shape of H(q) is traced, for \ = 8,..., 98, amd } <q <2 
in Graph (1). Since we know from (1.16) that lg q has a symmetrical distribu- 
tion, we use a logarithmically normal probability paper where q is plotted on 
the abscissa in a logarithmic scale, and H,(q) is plotted on the ordinate in a 
normal probability scale. The probability P)(2) for any value of q to be con- 
tained in the interval } < q < 2 increases with X, i.e., with the sample size, and 
the distribution of the extremal quotient contracts. 


1) ASYMPTOTIC PROBABILITY OF THE EXTREMAL 
QUOTIENT FOR THE EXPONENTIAL TYPE 


6 8 ' 1.2 14 té LB 2.0 


99 


98 


@ 
° 


o 
° 


PROBABILITY Wy(q) 
> 
o 


ny 
° 


O55 








EXTREMAL QUOTIENT q 


If the initial distribution is unknown, the parameter \ has to be estimated 
from the observed extremal quotients. Equation (2.11) may be used for this 








532 E. J. GUMBEL AND R. D. KEENEY 


purpose. We calculate the observed relative frequency P,(2) of extremal quo- 
tients contained between g = 34 and g = 2, and substitute it for the probability 
P,(2). To facilitate this estimate of \, we trace P,(2) against in graph (2). 
The probability P (2) is traced on the ordinate in linear scale, and the parameter 
d is traced on the abscissa in inverse scale. Thus ) is easily estimated from the 
observed relative frequency P,(2). 


2) ESTIMATION OF THE PARAMETER JA 


“K 


PROBABILITY P, (2) 
@ 
° 


-70 





-65 


| TI 
8 9 10 15 20 30 40 50 100 200 © 
PARAMETER A 


The distribution h)(q) of the extremal quotient obtained by differentiating 
the probability function (2.7) with respect to q is 


1 
(2.12) hq) = ao I N? e Ete 29( Ip 2) dz. 


The symmetry relation (1.7) is easily verified. We now investigate the boundary 
value h,(0) and prove that 


(2.13) lim hy(q) = hy(0). 


This is not obvious, since z* becomes indeterminate if both z and q vanish. For 
the proof of (2.13), consider the integral 


1 
(2.14) I= rf e *(—lgz) dz 
0 





85 


75 


65 


ng 


ry 


‘or 


THE EXTREMAL QUOTIENT 533 


or 
(2.15) I= (1—e)lgk —y+e°lgd — ei(—d). 


The last term, the exponential integral, is positive. The value of h,(0) is thus, 
from (2.12) 


_ reg ¥ — y — eé(—2)) 
(2.16) h,(0) = —i-o ’ 


The difference 
A = (1 — €)*(h(™ — ha(0)) 


becomes, from (2.12), (2.15) and (2.16), by the use of the mean value theorem 
and after expansion 


A = f(a) [ (eo * 22 — &) dz 


= (—1)’)’ ( 1 ) 
= f(r lndiure Lanois £2, 
Jo) 2X v! vV+1)qt+1 
where f(A) is a positive function. Since the series is absolutely convergent, the 
difference A vanishes for g = 0, and the density of probability for g = 0 is given 
by (2.16). The condition h,(0) = 0, valid for any distribution, is met provided 
that 


(2.17) A > 1.794 


By virtue of (2.4) this is a (weak) condition concerning the sample size. From 
(2.16) it follows that h,(0) does not vanish although its numerical value is very 
small. 

The existence of at least one mode follows from the fact that the distribution 
h,(q) is continuous, very small for g = 0, and vanishes for g = ». Equation 
(1.9) proves that any mode is inferior to unity. The distribution contracts for 
increasing values of the parameter. Therefore the mode approaches the median 
with increasing sample size. 

Since the distributions of the exponential type do not possess reciprocal mo- 
ments it follows from (1.10) that the distribution h,(q) does not possess moments. 
The mean extremal quotient ¢ diverges. Because the logarithmically normal 
distribution used in graph (1) as first approximation to the distribution h)(q) 
possesses all moments, the distribution h,(qg) has a much longer tail than the 
logarithmically normal one. 


3. Application to the Cauchy type. For the exponential type, the asymptotic 
distribution of the extremal quotient can only be expressed in the form of an 
integral containing a parameter \ which is a function of the sample size. For the 
Cauchy type, to be defined in the following, the asymptotic distribution will 
turn out to be very simple. 








534 E. J. GUMBEL AND R. D. KEENEY 


A distribution of a variate x 2 1 was said [5] to be of the Pareto type if 


(3.1) lim «(1 — F(x)) = A; k>0; A> 0. 


We now say that a variate is of the Cauchy type if it is unlimited, continuous, 
subject to (3.1), and symmetrical about zero. Distributions of the Pareto and 
the Cauchy type do not possess moments of an order equal to or larger than k. 
However, not all unlimited symmetrical distributions with a finite number of 
moments are of the Cauchy type. 

The simplest example of such a distribution is the Cauchy distribution itself 


1 . 
m(1 + 2°)’ 


which possesses no moments. For large absolute values of x, the usual expansion 
leads to 


(3.2) f@) = 


1 
F(x) = 3 + - are tg z, 
Tv 


F(x) =1- Zz + O(x*); F(—2z) = fm O(x7). 
TL TL 


If the factors O(x*) are neglected, the parameters A and k in (3.1) are 
(3.2’) A=nr'; k=1. 


For the Cauchy type, the asymptotic probability I(x) and distribution z(z) 
of the largest value « = 2, established by Fréchet [3], R. A. Fisher [2] and R. von 
Mises [8] are 


(3.3) I(x) = exp |-(*) |; a(x) = = (“) oe exp |- (“) |, 


where wu is defined by (2.2). 

The condition (1.19) is fulfilled for any sample size which is so large that the 
asymptotic distribution of the extremes may be used. The asymptotic prob- 
ability H:.(q) of the extremal quotient for the Cauchy type is obtained from (1.11), 
if y, f(y) and F(y) are replaced by x, w(x), and II(x), respectively, where the 
indices n and a are omitted. Consequently, from (3.3), 


"ela cinwtent 
H,.(q) == [ ca — é . dx. 


0 U\r 


From the transformation 


k ; k+1 
(“) = Zz; k (“) dx = dz, 
x u \x 


the asymptotic probability H,(q) and the asymptotic distribution h,(qg) of the 
extremal quotient become simply 
k } k—1 


oes ie a 
(3.4) Hi(q) = T+ ¢’ hi(q) a+’ q 


IV 
So 


THE EXTREMAL QUOTIENT 535 
Evidently, the symmetry relations (1.7) and (1.14) are fulfilled for any k. The 


graphs (3) and (4) show the distribution h;(q) and the probability H,(q) for 
the most interesting cases k = 1, 2,3. From 


lg Hy 
‘hate = Ig g(l — Hil) 


it follows: For k increasing, the probability H;.(q) decreases for g < 1, and in- 
creases for g > 1. The distribution contracts with increasing values of the parameter 
k as shown in the graphs (3) and (4). The more moments that exist in the initial 
distribution, the more concentrated is the distribution of the extremal quotient. 









1.0 1.0 
1.0 90 
80 
9 
oe 70 
i" - 60 
8 ; \ é . 
: ~ .50 
a 1 & 4) ASYMPTOTIC PROBABILITY OF THE EXTREMAL 
1 8 QUOTIENT FOR THE CAUCHY TYPE ~~ |‘*° 
AS 
-6 . 4 
! \ \ 
\ 1 
\ 


——a ae 
\ 
' 





: 5 10 15 20 25 30 
EXTREMAL QUOTIENT 


DENSITY OF PROBABILITY hf) 





3) DISTRIBUTIONS OF THE EXTREMAL 
QUOTIENT FOR THE CAUCHY TYPE 


EXTREMAL QUOTIENT 4 


The density of probability 


hy(1) = k/4 
of the median obtained from (3.4) and (1.14’) increases with k. The mode @ of 
the extremal quotient is obtained from (3.4). For i: > 1 this leads to 
k — 1 
- ok i 
(3.5) at x 


e 


<1. 





536 E. J. GUMBEL AND R. D. KEENEY 


For k S 1 no mode exists, and the distribution diminishes with g. The larger 
k, the smaller is the distance from the median to the mode, and hence, the 
smaller the asymmetry. The density of probability of the mode increases with 
k:, and the probability 


(3.6) Ay(q) = 3(1 — 1/k) 


approaches 4. The distribution (3.4) belongs to the Pareto type and has no 
moments of an order equal to or greater than /:. 

In N samples of sufficiently large size n, the largest quotient gy , defined in 
the same way as wu in equation (2.2), obtained from (3.4) 


(3.7) gy =N-1 


increases as a root of the number of samples, i.e. very quickly. The higher the 
order of the highest moments existing, the smaller will the expected largest quo- 
tient be. 

From (3.4) and the symmetry (1.14) we obtain 


(3.8) Hi(g) — Hi(1/qg) = 1 — 2/(1 + @’). 


The larger /, the larger is the percentage of the observations contained in the 
interval 1/q to q. 

For a systematic estimate of k, the transformation (1.15) is used. Formula 
(3.4) leads to the probability H*(z) and the distribution h*(z) where 


l ke 
——— ; h*(z) = ————.. 
1 oe e kz (1 a e*s)2 


The logarithm of the extremal quotient for initial distributions of the Cauchy 
type (where no moments of an order equaling or exceeding /: exist) has the 
logistic distribution, [6], as the midrange v = x, + % for distributions of the 
exponential type (where all moments exist). The logarithm of the extremal 
quotient plotted on logistic probability paper should be scattered around a 
straight line. 

The order i: of the lowest moment which diverges is obtained from the vari- 
ance o. of the distribution h*(z) which is [6] 


(3.9) H*(z) = 


(3.10) 


For the estimate of k from (3.10), o? is replaced by the estimate s? obtained from 


N 
(3.11) Bt gees Fat Sz. 


N —s 1 y=] — Pi.» 


For the Cauchy distribution itself, k = 1, and the probability and the dis- 
tribution of the extremal quotient 


Hq =d7/0+0; m@=A+ aq) 





ym 


\is- 


THE EXTREMAL QUOTIENT 537 


are similar to the initial distribution. 
The asymptotic distribution of the extremal quotient for initial distributions 
of the Cauchy type contains one parameter only, the order of the lowest diverg- 


ing moment in the initial distribution. All other traces of the initial distribution 
have disappeared. 


4. Comparison of the extremal properties for the two types of initial distribu- 
tions. Assume that the initial distribution is symmetrical, unlimited, and pos- 
sesses an asymptotic distribution of the extremes. This is not always fulfilled. 
All moments may exist, and yet the distribution may not belong to the expo- 
nential type. No moments may exist, and yet the distribution may not belong 
to the Cauchy type. If the assumption holds, the initial distribution belongs 
either to the Cauchy, or to the exponential type. 

We take N samples of size n, and estimate the median X of the population 
from the central value m of the N central values of the samples. Let X,,, and 
Xnwv (v = 1,2, --- N) be the two extremes. If it happens for any v that 


Ain > mor Xae < m 


the sample is too small, and its size has to be increased. The central value gq of 
the observed extremal quotients gq, = (Xn, — m)/(m — X;,,,) must be near 
unity. 

If the initial distribution is of the exponential type, all moments in the popula- 
tion exist, and the midrange has the logistic distribution. If the initial distribu- 
tion is of the Cauchy type, no moments of an order greater than k exist, and the 
logarithm of the extremal quotient has the logistic distribution. The order k 
can be estimated from the variance (3.11). If all moments in the population di- 
verge, the calculation of the observed moments is futile since they do not charac- 
terize the population. 


Addendum. The referee of this paper has suggested the following method for 
obtaining an asymptotic distribution of the extremal quotient for the exponen- 
tial type. For large values of \, formula (2.7) becomes, approximately, 


1 
H\(q) = [ ie 


Let 
Az = y. 


Hy(q) = [ exp {- y E he (“)} dy. 


The further transformation 


Then 


e' = \*" q—1 =t/Igr, 





538 E. J. GUMBEL AND R. D. KEENEY 


leads to the probability H*(t) of the variate t 


r 
H*(t) - | exp{— y[l + ety" ®)) dy, 


whence asymptotically for \ — 
H*(t) -[ exp{—y(i + ¢')} dy 
0 


=1/(1 + ¢"). 


Therefore the logistic distribution holds at the same time for both initial types, 
using the transformation t = aw(g — 1) for the exponential type, and the loga- 
rithmic transformation for the Cauchy type. 


REFERENCES 


[1] H. Cramtr, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[2] R. A. FisHer anv L. H. C. Tippett, ‘Limiting forms of the frequency distribution 
of the smallest and the largest member of a sample,’’ Proc. Camb. Philos. Soc., 
Vol. 24 (1928), p. 180. 

[3] M. Frécuet, Sur la loi de probabilité de l’écart maximum. Annales Soc. Polon. Math., 
Vol. 6 (1927). 

[4] R. D. Gorpon, ‘“‘Values of Mills ratio of area to boundary ordinate and of the normal 
probability integral for large values of the argument,’’ Annals of Math. Stat., 
Vol. 12 (1941), pp. 364-366. 

[5] E. J. GumBeEt, ‘‘The return period of flood flows,’’ Annals of Math. Stat., Vol. 12 (1941), 
pp. 163-190. 

[6] E. J. Gumpe., ‘‘Ranges and midranges,’’ Annals of Math. Stat., Vol. 15 (1944), pp. 
414-422. 

[7] E. J. Gumpet, ‘On the independence of the extremes in a sample,’’ Annals of Math. 
Stat., Vol. 17 (1946), pp. 78-81. 

[8] R. von Misks, ‘‘La distribution de la plus grande de n valeurs,’’ Revue Math. de l’Union 
Interbalkanique, Vol. 1 (1936). 










ON A PRELIMINARY TEST FOR POOLING MEAN SQUARES 
IN THE ANALYSIS OF VARIANCE! 


By A. E. Pauty 


Grain Research Laboratory, Winnipeg 









Summary. The paper describes the consequences of performing a preliminary 
F-test in the analysis of variance. The use of the 5% or 25% significance level 
for the preliminary test results in disturbances that are frequently large enough 
to lead to incorrect inferences in the final test. A more stable procedure is recom- 
mended for performing the preliminary test in which the two mean squares 
are pooled only if their ratio is less than twice the 50% point. 



















I. INTRODUCTION 


The problem discussed in this paper is one of a large class involving preliminary 
tests of significance. Studies of this type have recently been made by Bancroft 
[1] and Mosteller [2]. Bancroft dealt with a preliminary test for homogeneity 
of two variances, and a test of a regression coefficient. Mosteller dealt with the 
problem of pooling means from two normal populations having the same known 
variance. The present problem is an extension of Bancroft’s work from investiga- 
tions of the bias and variance of an estimate of variance, to investigations of the 
consequences of using that estimate in performing a further test of significance. 

The problem arises frequently in the analysis of variance. As a simple example, 
consider an experiment carried out to test the hypothesis that different labora- 
tories in a district all determine the protein content of wheat without systematic 
differences between laboratories. Three laboratories are selected at random 
and each is requested to analyze ten samples of the same wheat, five on each of 
two days. The analysis of variance would be set up in one of two ways: 












MODEL I MODEL II 
Source of variation df MS Source of variation df MS 





Between laboratories 2 23 Between laboratories 2 V3 
Between days within labs. 3 v2 ee F 3v2 + 240; 
Within days 21 », Within laboratories 27 on 










The soundest procedure is to follow Model I in which the F-ratio, v3/v , 
provides a valid though not very powerful test of the null hypothesis. But the 
investigator often doubts that this is the most effective form of analysis. His past 
experience may have shown that measurements of this kind seldom exhibit 
day-to-day variations appreciably greater than their within-day variations. 
If he is willing to accept this credible assumption, he adopts Model II because 








1 Based on a doctoral dissertation submitted to the Faculty of North Carolina State 
College of the University of North Carolina at Raleigh, N. C., in June, 1948. Published as 
Paper No. 107 of the Grain Research Laboratory, Board of Grain Commissioners, Winnipeg. 


539 


540 A. E. PAULL 


this increases the degrees of freedom from 2 and 3 to 2 and 27. These two models 
may conveniently be called the ‘‘never pool” and the “always pool” procedures. 

The investigator often prefers what may be called a “sometimes pool” pro- 
cedure. He starts with Model I and examines the null hypothesis that the 
variation between days is no greater than the variation within days by testing 
the F-ratio v./v, . For this test, he selects a probability level P; that may be the 
5% or some higher level. If the hypothesis of this preliminary test is not rejected, 
his judgement has been substantiated and he adopts Model II and pools the 
two mean squares. If the hypothesis is rejected, he retains Model I since he 
concludes that v. alone is the only valid estimate of error. 

The following notation is introduced: 


Degrees of freedom Mean square Expected value of mean square 


N3 v3 a3 


N2 v2 o> 


nm v o1 
where oi < 02 < 93. 

The mean squares 0; , v2, and v3 are assumed to be distributed as central 
chi-squares. This assumption is justified if the treatments (laboratories in the 
example) are selected at random from a population of treatments. But if, as is 
more frequently the case, the experimenter is interested only in specified treat- 
ments, the non-central chi-square model is the appropriate one. However, if 
the two cases are sufficiently parallel, as seems probable, conclusions drawn 
from the central model may be expected to apply to the non-central model. 

Let 62; = 03/0; and 632 = 03/03 , and let F(v , v2, P) denote the value exceeded 
by F for », and v, degrees of freedom with probability P. The rule of procedure 
for the ‘“‘sometimes pool” test may be restated as follows: 

Reject the main hypothesis that ¢3 = 03(03. = 1) if 

Vo/0, = Fy(n2, m1; Pi) and v3/v2 > F2(nz, nz ; Peo) 


or if 


Vo/0; < Fi(n2, 71; P1) and (ne + 1)v3/(nve + ny) > F3(n3, ne + 1 ; Ps). 


The “‘never pool’ procedure in which P» is used, and the “always pool’ procedure 
in which P; is used, may be considered as special cases of the “‘sometimes pool” 
procedure in which P; takes on its extreme values, 1 and 0 respectively. In 
practice, the probability levels P, and P; are usually the same; in the present 
study they are allowed to be different in case this greater flexibility should prove 
desirable. The objects of the investigation are: (a) to examine the Type I error 
under the above rule of procedure, i.e., to determine the frequency of rejecting 
the null hypothesis when it is true; and (b) to examine the behaviour of the power 
with particular reference to comparisons with the power of the ‘never pool” 
procedure. 

The remainder of this paper is divided into four sections: Part II contains a 








PRELIMINARY TEST 541 












general discussion of the results, conclusions and recommendations; and Part III 
illustrates the general conclusions with numerical examples. The derivation of 
distributions, proofs by elementary arguments of general qualitative results, and 
derivations of closed form expressions for n3 = 2, are given in Part IV. 




































II. GENERAL Discussion OF RESULTS, CONCLUSIONS AND RECOMMENDATIONS 


2.1. Criterion employed. In this part the principal results and recommenda- 
tions are discussed for the reader who is not interested in the mathematical 
details. To give results in a simple form is not easy, because of the many variables 
—the P’s, the @’s, and the n’s—that enter into the problem. It may be helpful 
to consider what is wrong with the “always pool” test, and then to state the 
properties which the preliminary test must have if it is to be regarded as useful 
and successful. 

If the “always pool” procedure is employed when in fact o3 is greater than 
01, i.e. 6; > 1, the denominator in the final F test tends to be too small. Thus 
the final F test gives too many significant results when its null hypothesis is 
true and if 6; is great enough, there is no bound to this hidden distortion of the 
significance level. A test which the research worker thinks is being made at the 
5% level might actually be at, say, the 47% level. 

The preliminary test represents an attempt to avoid this alarming disturbance, 
since if 6, is very large the test is expected to warn against pooling. Such a 
procedure, however, can not be expected to remove this disturbance completely, 
and it does not do so, but to be successful it should keep the true or effective 
significance level of the final F test close to the nominal level at which the 
research worker thinks he is working. 

A second requirement is that the preliminary test should increase the power in 
the final F test relative to the power of the “never pool” test. When the powers of 
the ‘‘sometimes pool” and “never pool” tests are compared, it is important to 
make the comparison af the same significance level. Suppose the preliminary test 
shifts the significance level of the final F test from the 5% to the 6% level—a 
disturbance that for some uses would not be regarded as serious. In this event the 
“sometimes pool” test (at the 6% level) would tend to be more powerful than 
the ‘never pool” test at the 5% level, because an increase in significance level 
generally results in an increase in power. But unless the “‘sometimes pool’’ test 
has more power than a “never pool” test made also at the 6% level, it has no 
real advantage over the “never pool” procedure. 









2.2. Effect of preliminary tests made at the 5% level. Probably the most 
common procedure in practice is to perform the preliminary test at the 5% level 
(ie. P; = .05) and, whether pooling is prescribed or not, to conduct the final F 
test also at the 5% level, (i.e. Po = P3; = .05). Such a procedure, except when 
6, is near one and the null hypothesis is true, results in the null hypothesis being 
rejected more frequently than if pooling is never resorted to. 

When the ratio 6; is equal to one, so that routine pooling would be valid, the 


542 A. E. PAULL 


preliminary test is effective. The true significance level of the final F test is 
decreased slightly, but is always confined between the 5% and the 4.75% levels. 
Further, the power is always greater than that of the “never pool’ test made 
at the same significance level. 

As 62; increases from 1, the true significance level of the final F test increases 
to a maximum and then slowly decreases to 5%. Unfortunately the maximum 
need not be near to 5%: in the example presented later it is about 15%, and for a 
broad range of values of 62; the true significance level is higher than 10%. Com- 
parison with the power of the “never pool” test is also unfavorable to the “‘some- 
times pool” test. For values of 6, near 1, the “sometimes pool” test has the 
higher power, but as 62; becomes larger the advantage passes to the “never 
pool” test. 

When 62; is very large there is, as would be expected, little disturbance. The 
preliminary test seldom prescribes pooling, so that the properties of the ‘‘some- 
times pool” test are very similar to those of the ‘“‘never pool” test, although the 
“never pool” procedure yields the slightly higher power. 

The main objection to the use of the ‘‘sometimes pool” test is associated with 
the intermediate values of 6; . If over a series of experiments 62, has a moderate 
value greater than one, the “‘sometimes pool” test at the 5% levels yields more 
apparently significant results than are anticipated, and is also less powerful 
than a corresponding “never pool” test. The magnitude of these undesirable 
properties can be reduced somewhat by increasing the significance level of the 
preliminary test. 


2.3. Effect of preliminary tests made at the 25% level. Use of the 25% in- 
stead of the 5% significance level for the preliminary test reduces, in general, the 
probability of rejecting the hypothesis. This reduction, at intermediate values 
of 6, , results in a reduction of the extreme disturbances. When the ratio 42; is 
equal to one, however, the effects are not as favourable. If the hypothesis is 
true, still fewer apparently significant results occur. A final test being carried 
out at the 5% level can now have an effective significance level close to 3.75%. 
If the hypothesis is false, the test is still more powerful than a corresponding 
‘never pool” test but the gain is not as great as when a preliminary test at the 
5% level is employed. Since most experimenters desire a reasonable amount of 
protection against an error in judgement of the true value of 6; , the reduction 
in disturbances for intermediate values of 6; , resulting from the use of the 25% 
rather than the 5% level, would be judged to outweigh the disadvantages of the 
compensating factors. 


2.4. Effect of further increases in significance level. Increasing P; , the sig- 
nificance level of the preliminary test, decreases the probability of rejecting 
the hypothesis only to the point where a critical value P, is reached. Increasing 
P, beyond this value results in an increase in the probability of rejection. The 
properties of a “sometimes pool’ test in which P, is less than P, differ, in general, 
from those of a test in which P, is greater than P, . 










PRELIMINARY TEST 543 




















Tests of the former type, which are referred to here as Class A tests, are the 
tests commonly encountered in practice. Considering, for example, a test in 
which P, = P; = .05 and 2 = 20, n2 = 4, ns = 2, we find the critical value P, 
to be .77, a figure much larger than the values .05 or .25 customarily chosen 

































3 for P,;. The major portion of the present discussion deals with Class A tests. 
1 Tests in which P, is greater than P, are referred to as Class B tests and discussion 
% of their properties is relegated to a later section. An expression for evaluating 
. P, is given in Subsection 4. 3. 

2.5. Effect of P2, P;. The probability levels (P2, P;) used for the final test 

determine the properties of the ‘‘sometimes pool’ test for extreme values of 62; . 
When 62; is equal to one, the effective significance level is less than the nominal 
e value P;, but is not less than (1 — P,)P;. The power of such a test is greater 
than the power of a corresponding “never pool” test, but less than the power of a 
e test in which one always pools and uses the P; level. For very large values of 62 
the behavior of the ‘‘sometimes pool” test approaches, in all respects, the 
h behaviour of a “never pool” test at the P» level. 
e 
e 2.6. Effect of m2, m,. The degrees of freedom m2 and 7, , associated with the 
il mean squares that are sometimes pooled, clearly affect the magnitude of the 
le disturbance. Because analytic investigation becomes complex, the following 
e remarks are based on conjectures arising out of examination of a number of 
numerical examples. 

A large value of m2 is desirable in two respects. As nz becomes larger the 
> preliminary test becomes more powerful and pooling is prescribed less often. In 
1e addition, when pooling is prescribed the pooled mean square is further weighted 
es in favour of the valid error o3 . Both factors are contributing towards a decrease 
is in bias of the error mean square with a consequent reduction in the disturbance 
is introduced into the final test. 
od The effect of n; is not as simple. As m; becomes larger the preliminary test 
Zo. again becomes more powerful and pooling is prescribed less often. But when 
ng pooling is prescribed, the pooled mean square in this case is further weighted in 
he favour of oj , which is smaller than the valid error a3. The effect on the final 
of test, which is due to a combination of these two factors, clearly depends on the 
on value of 6.,;. For intermediate values of 6. the latter factor is the predominant 
% one, and the disturbance of the effective significance level is increased as n, is 
he increased. 

2.7. Class B Test. A Class B test is one in which the probability level (Pi) 
ig- of the preliminary test is greater than a critical value P; . Pooling is prescribed 
ng only when the mean square 2; is relatively large, with the result that the error 
ng mean square tends to be too large. Accordingly, a Class B ‘sometimes pool” 
‘he test rejects the hypothesis less frequently than a ‘never pool” test at the P» level. 
al, The effective significance level of a Class B test is less than P» for all values 


of 62, . It has its lowest value when 62 is equal to one, and approaches P2 as 021 


544 A. E. PAULL 


becomes very large. Because pooling is prescribed infrequently, little power is 
gained by the use of a Class B test rather than a “never pool” test. 


2.8. Recommendations. The principal conclusions discussed in the preceding 
subsections may be summarized as follows: A preliminary test carried out at a 
significance level as low as 5% affords little protection against errors in judge- 
ment. If oj is equal to o3(@; = 1) the reduction in errors of inference is appre- 
ciable; but if, in fact, oj is less than o3(8,; > 1), a greater number of incorrect 
inferences are made than if a preliminary test is not employed at all. The use 
of the 25% significance level for the preliminary test introduces the same dis- 
turbances but to a lesser extent. Extreme increases in the effective significance 
level at possible values of 62 are reduced and losses in power at these values are 
not as serious. The 25% level provides a reasonable amount of protection against 
an error in judgement regarding the true value of 6. However, when nz is 
large relative to n,, a smaller significance level could be employed without 
introducing any serious disturbances at the intermediate values of 6, and 
with a resulting gain in power at values of 6; near one. 

The following method of performing a preliminary test is recommended as one 
which tends to stabilize the disturbances at intermediate values of 6; while still 
taking advantage of a considerable portion of the possible gain in power at 
values of 6; near one. The procedure consists of pooling the two mean squares 
v. and v; only if their ratio is less than 2 Fy , where F'5 is the 50 per cent point 
of the F-distribution for nz and n; degrees of freedom. The use of the multiple 2 
is arbitrary and a smaller value may be used if the experimenter desires additional 
control over extreme disturbances. 

This procedure has the advantage of admitting less disturbance over a larger 
range of values of n. and n; . The customary method prescribes pooling if the null 
hypothesis (6; = 1) of the preliminary test is not rejected at some preassigned 
probability level P;. If enough observations are available to provide reliable 
values for v2 and 1; , pooling is prescribed only if o3 and oj are essentially the same. 
However, if small numbers of degrees of freedom are involved, the preliminary 
test is too weak to reject the hypothesis even if oj is appreciably less than o} , 
and pooling will be prescribed too frequently. On the other hand, the use of the 
recommended procedure has the effect of prescribing pooling only when it can 
be said, with confidence exceeding 50%, that the true value of 6; is less than 
some chosen value such as 2. | 

This can be demonstrated simply by considering a series of experiments 
in which preliminary tests are performed. When v/v; < 2F5, we make the 
statement 


(1) Ou < 2, 
and when v2/v; > 2F'5 , we make the statement 


(2) Oo > 2. 





PRELIMINARY TEST 


We have 


If statement (1) is true, 


and if statement (2) is true, 


Pr ~ > Fe} > .50. 


Thus, no matter what the true value of 6, the statements are true more 
than 50% of the time. 

Fifty per cent points of the F-distribution have been tabulated by Merrington 
and Thompson [3]. 

A simpler rule, and one which is nearly equivalent when the degrees of freedom 
involved are each greater than 6, is to pool if the ratio of the mean squares is less 
than 2, without any reference to the F-table. For smaller numbers of degrees of 


freedom, however, this simpler rule does not embody the advantages of the 
2F 0 rule, unless of course, m; and ne are equal. 


IIIT. NumMericaut ILLUSTRATIONS 


3.1. Effect of P, illustrated. An example of the influence of P; on the effective 
significance level or Type I error of a “‘sometimes pool’ test is illustrated in 
Figure 1. When P; = 0, the Type I error has its maximum value equivalent to 
the Type I error of an “‘always pool” test at the P; level. As P; increases from 
zero, the Type I error decreases until at P,; = P,(.77 in this case) it reaches its 
minimum value at a level less than P,. As P,; increases from P,, the Type I 
error increases until, at P; = 1, the Type I error is equal to P2. 

The influence of P; on the power of a “sometimes pool” test is illustrated in 
Figure 2. The gain in power, as a function of 6: is presented for three Class A 
tests. Since comparisons of power are made over tests having different Type I 
errors, the gain is expressed as the proportion actually attained of the total 
gain in power that is possible if the true value of 6; is actually known. When 
P, = P, = .77, the curve is observed to decrease monotonically to zero. However, 
for lower values of P; , the preliminary test prescribes pooling more often, and 
more power is gained when 62; is near one but less power is gained or power is 
actually lost when 4; is large. 





546 A. E. PAULL 


The power gained or lost at various values of 62; is illustrated in Table I. 
The probability of rejecting the hypothesis for the “sometimes pool” test is 


TYPE | ERROR 


ERROR 


Ww 
a 
> 
~ 


5 10 15 
Qo, 
Fig. 1. Effect of Varying Pi. ni = 20, ne = 4, n3 = 2 and P; = P; = .05. (a) Upper 
diagram: Class A Tests. (b) Lower diagram: Class B Tests. 


GAIN IN POWER 


Fic. 2. Proportion of Possible Gain in Power Actually Attained. n; = 20, nz = 4, n3 = 2, 
P. =P; = 06. 


tabulated opposite ‘‘s.p.”, and for the ‘‘never pool” test having the same Type I 
error opposite ‘‘n.p.’’. 











PRELIMINARY TEST 547 









The last line of the table approaches the probabilities for a ‘never pool” test 
having a Type I error of 5%. Except for values very near (62 , 032) = (1, 1), the 
probability of rejecting the null hypothesis, using a “sometimes pool” test, is 
greater than if a “never pool” test, at the 5% level is used. In this sense, the 







TABLE I 

Comparison of Power of a ‘‘Sometimes Pool’’ (s.p.) Test and Corresponding ‘‘Never Pool’? 
(n.p.) Tests 

m = 20, n. = 4, n3 = 2; P; = P.2 = P; = .05 

















Type I Value of 632 
— of Test Error - 
- 632 = 1 1.8 2.8 4.3 TA 12.5 25 50 250 











.048 
.048 











.067 
.067 


.102 
.102 












127 
127 


845 








.146 
.146 










.482 
.528 











831 
.882 











.402 









.148 
.148 










.309 
405 


.399 
.531 


.796 


.280 .883 



















Ay 
eee 






.182 
.234 






.255 
.352 


.390 
478 


781 
.862 





751 











































oer 
10 S.p. .091 .152 220 320 .465 .621 .776 .877 .974 
n.p. .091 .191 .300 .422 .569 412 .838 .913 -982 
16 S.p. .067 .130 .209 .313 F .615 143 ‘ ’ 
n.p. .067 .149 .245 .361 .509 .662 .805 .895 .978 
100 s.p. .051 kT .200 .307 .452 .613 771 .875 .973 
n.p. .051 .118 .201 .308 .454 .615 173 .875 .973 
Below the heavy line the s.p. test is less powerful then the n.p. test. 
“power” of the ‘sometimes pool” test is greater everywhere except near 
(6 , 032) = (1, 1). 
= 2, 
3.2. Effect of P., P; illustrated. The influence of the probability levels em- 
ployed in the final phase of a ‘‘sometimes pool” test is illustrated in Figure 3. 
pe I 


The main effect is observed to be the manner in which the behaviour is con- 
strained at the extreme values of 62) . 


548 A. E. PAULL 


TYPE | ERROR 





Fic. 3. Class A Tests; n; = 20, ne = 4, nz = 2. 


No= 4 

To 

12 
« 
e 
az 10 
w 
~ 212 

2 

ww 08 
> ‘i Nigs20 

04 

02 

i 5 10 is 20 


TYPE 1 ERROR 





_ Fig. 4. (a) Upper Diagram: Effect of Varying nz. P; = Pz = P; = .05 and n; = 20, 
ns = 2. (b) Lower Diagram: Effect of Varying n.. P; = P2 = P; = Sand nz = 4, nz = 2. 


3.3. Effect of m2, n, illustrated. The response of the Type I error to increases 
in the degrees of freedom of the preliminary test is illustrated in Figure 4. The 
maximum disturbance is observed to increase as n; increases Or as M2 decreases. 





PRELIMINARY TEST 549 


3.4. Class B test illustrated. The behaviour of the Type I error of some Class 
B tests is illustrated in Figure 1(b). The hypothesis is always rejected less 
frequently than if a “‘never pool’ test at the P» level is used. 










No? 4, P18 
No 212, P,=09 
No *20, Pys 06 


TYPE | ERROR 


n,=20, P,= 18 

N, =12, P)2.20 
n, 24, P, 2.26 
08 


06 


TYPE | ERROR 


o4 


02 


t S 10 15 20 
8, 


Fig. 5. (a) Upper Diagram: Effect of Varying nz when F; = 2F 59, P2 = P; = .05 and 


n, = 20, n; = 2. (b) Lower Diagram: Effect of Varying n; when F; = 2F50, P2 = P; = .05 
and nz = 4, n3; = 2. 


° 
@ 


1 720, P, =18 
n, 2t2, P,=.20 
n, =4, 2.26 


TYPE | ERROR 
2 8 


° 
oS) 





Fig. 6. Effect of Varying n1 when P; > P;. P2 = .10, P; = .05 and nz = 4, n3 = 2. 


3.5. Recommended procedure illustrated. Figure 5 illustrates the behaviour 
of the Type I error when the recommended procedure is applied to the special 
cases presented in Figure 4. When m, = 12, n. = 4, the 20% probability level is 








550 A. E. PAULL 


prescribed and the Type I error never exceeds .09. When n; = 20, n2 = 20, the 
more liberal value of 6% is prescribed and the resulting Type I error never 
exceeds .07. The more liberal choice of P; results in a greater gain of power, 
near 62; = 1, than would have resulted if the 20% level had been used throughout. 
A small loss in power occurs when 62; is large. Should the experimenter wish to 
guard against this loss in power for a larger range of values of @.; near one, he 
may do so, at the expense of a somewhat larger disturbance in the Type I error, 
by choosing P, larger than P;. In the present example, if P2 is taken as .10 
instead of .05, Figure 6 shows that the Type I error is changed only slightly for 
values of 62; near one, but the maximum disturbance is increased. Such a test, is 
uniformly more powerful than the ‘‘never pool” test for all values of 42: for which 


the Type I error is less than .10; a much larger range of values than in the 
previous case. 


IV. DERIVATIONS AND PROOFS 


4.1. Derivation of joint frequency function. The joint frequency function of 
the v’s is given by 


N,V Ne Ve N3 V 

ee ae | ivi 2 ¥2 3 %3 
Ci Vi 1 V5 2 v3 3 exp { 4 —— <—_e ; : 

O1 G2 03 


where c; is independent of the v’s. Transform to new variables: 





Ne Ve i tee N3 V3 — N1 Vi 
= 9 = ‘ Sa 


4 = ——, 
N1 V1 Ne v2 n3 





By integrating and evaluating the constant, the joint frequency function of 
u, and ws is obtained: 





areaye or a 
(3) p= ; : BG 1( )) G 6 @ 1p) 8matnatna) 
B(dne2, 3%) B(3ns, 3(m1 + ne 21932 + O32 Uy Uy Ue 
2/2 272 
where 0; = 03/0} 5 O30 = 03/02 : 


4.2. Definition of critical region. The rule of procedure for the “sometimes 
pool” test may now be expressed in terms of the wu’s. Reject the hypothesis 
O39 = 1 if 


0 
0 Uy < Uy 
fur > U1, ? 
H > 0 or U1 U2 0 
(U2 - U2, lI + w = U3, 
i 1 
where 
0 Ne 
“N= -Fy(ne > N14 :), 
nN 
0 nN3 
Us = — -F.(n3, ne ; Po), 
Ne 
N3 


0 
U3 = 


Ne + Mm -F3(n3 » Ne - nN, ; Ps). 


PRELIMINARY TEST 551 


The reader will note that the w’s are ratios of sums of squares. The symbol wu; 
is associated with the preliminary test. The final test when pooling is not pre- 
scribed is associated with the symbol u., and when pooling is prescribed the 
relevant statistic is wyu2/(1 + wW). 

The critical region defined in this way is illustrated in the two dimensional 
sample space {w%, uw} of Figure 7(a). The critical regions of the ‘never pool’ 
and the “always pool’ test are readily identified in this figure. The region of a 
“never pool” test at the P2 level is designated by A + B, + C, the area above 
the line w. = u2; and the region of an “always pool” test at the P; level is 
designated by B, + B, + C + D, the area above the curve wu: = ug(1 + wm). 
The critical region of the ‘sometimes pool” test, B,; + B, + C, may be considered 
in two parts: the portion due to pooling, B, + B., and the portion due to not 
pooling, C. 


Oo 
ii =US(I¢ Ub) 
cn te ol 





U, up 








U, 
Fig. 7. Critical Region of ‘‘Sometimes Pool’’ Test. (a) Left: Class A Test: uy > t: (b) 
Right: Class B Test: uf < a. 


The probability of rejecting the null hypothesis is given by 


ut 2 oo 0 
(4) Q(G01, 932) = | / p du; duz + I. | »?P du, due, 
w v, Us 


where p is the frequency function (3), and w = u3(1 + wm)/m . 

Simple explicit expressions for these integrals cannot be obtained in general, 
but when nj = 2 they can be reduced to forms containing incomplete beta 
functions. This special case is dealt with in Subsection 4.7. 


4.3. Critical value of P,. The symbol a in Figure 1 is used to denote the 
u, coordinate of the point of intersection of the line uw. = u: and the curve 
WU, = u3(1 + u,). Accordingly, . 

(5) Uy = 0 = 0? 

Ung — U3 
a value readily determined for any given test. This relationship may be expressed 
in terms of the F’s as 





(6) F SS Se ae ; 


me (Fey) 4 
Ny F3 Fs 








Dos A. E. PAULL 


where F; is defined by ni, = n.F,. The probability level corresponding to 
F, is denoted by P, . 

The critical value P; is the value of P, which divides the possible “‘sometimes 
pool” tests into two types having different properties. If P; is less than P,(F; > F; 
or uy > %;), the test is referred to as a Class A test. If P; is greater than P\(F, <P, 
or ui < %), the test is referred to as a Class B test. 


4.4. Lemma 1. 
Lemma 1. If 65; > 0 and O39 > 632, and if the equality applies in one of these, 
then the ratio of the frequency functions (3) 


0 py |, 6) 
plu » U2 | O21, 632) 


increases monotonically as (i) wu increases with uz fixed, or as (ii) Ue increases 
; sae ; ‘ 0 ; 

with u, fixed, or as (iii) uw increases on fixed pooling curve ule = u3(1 + uw). 
Proor. The ratio (7) is a monotonic function of 


621032 + O32 + U1 U2 





, a er . 
621632 + O32U1 + U1 U2 
It is easily shown that an expression of the form (a + bx)/(c + dx) increases 
monotonically with respect to x if a/e < b/d, and this condition holds for cases 


(i), (ii), and (iii). 
4.5. Lemma 2. 


Lemma 2. Jf area L lies above a given pooling curve, and to the right of a given 
preliminary line, if axea K lies below the same pooling curve, and to the left of the 
same preliminary line, and if 


Pr{L | 601 ; 830} > Pr{K | Bo} ) O30}, 


then 
Pr{L | 021, 032} > Pr{K | 021, 632}, 


where 02, > 02, and 032 > 632 and the equality applies in one of these. 


. ; a / ’ —-— 
Proor. For any point (w,, vw) in K and any point (wi, wv.) in L, Lemma 1 


(iii) yields 
pur, Us | O21, PD) < pur, ur | 621 ‘ 630) 
plu, Ue | 601 , 632) pu; us | 601, 632) 


nv / , / . 
where wu. = c(1 + u;)/ui, and c is a constant defined by w = c(1 + m)/m. 
. - . . ” , 
Since K is below a given pooling curve, uw. < ws and 


| 901, O52) 


/ yr / 
p(uy, Us | Ao, 432) plui, u 





/ ” , , / , / / 
plu, U2 | O01, 630) pur, Ue | O01, O39 
ee oe 5 annem a 


“5 * 
“at 


~ 


od? 
~ 


PRELIMINARY TEST 553 


Consider 


p(y , Ue | Cus, ) < pur, Us | O01, 632) 


p(uy, U2 | Aor, O32) pur, Us | O21, O32) 
where b is a constant such that the inequalities hold for all (aw, w2) in K and 
all (wi, us) in L. 
Integrating over the regions yields 


Pr{K | 021, 832} < b.Pr{K | O21 , 030} 


and 

b.Pr{L | 60 , 052} < Pr{L| 621, 632}. 
But 

Pr{K | 621 , O52} < Pr{L| O21 , O52}; 
thus 


Pr{k | O51 . 632} < Pr{L| te i G30}, 
which completes the proof. 


4.6. General Properties. 


ReEsutt 1. When 6, = 1, the Type I error of a Class A test is less than P; . 

Proor. In the notation of Fig. 7(a), the probability of falling in B,; + B, + 
C + Dis P3 when 6; = 1 and 63. = 1. The region of rejection of the ‘‘sometimes 
pool” test is smaller by D. 

Resutt 2. When 0, = 1, the Type I error of a Class A test ts greater than 
(1 — P,)P3. 

Proor. The statistics w and w2/(1 + uw) are independent when 62; = 1 and 
63: = 1. Under these conditions, the probability of falling in B, + B,, in the 
notation of Fig. 7(a), is equal to the product of two incomplete beta functions 
having the values (1 — P;) and P;. Consequently, the Type I error is greater 
than (1 — P,)P3. 

Resutt 3. The Type I error approaches P2 as 6, approaches infinity. 

Proor. The distribution becomes singular when 6, = ©. The frequency 
function approaches zero uniformly for any finite value of u and approaches 


1 ug"? 


B(3ns, 3n2) (1 + ag) 8mm) 


at uw, = ©. When 6; = «, the entire mass is concentrated on the line u,; = © 
and is distributed as a beta variable along that line. In the notation of Fig. 
7(a), Pr(B, + Be) — 0 and Pr(C) — P2. 

Resutt 4. If the Type I error of a Class A test is Qo for 0: , then for 6 > On, 
the Type I error is greater than r, where r is equal to the lesser of Qo and P2. 








554 A. E. PAULL 


Three useful corollaries are associated with the above result: 

Resutt 4.1. If at 0, = 1, the value of the Type I error is less than P2 , this is 
its minimum value for any 62; . 

Resutt 4.2. If at 6. = 1, the Type I error is less than P2 , then as 0. increases 
from 1 the Type I error increases monotonically until Ps, is reached. 

Resutt 4.3. If for some value of 02, the Type I error is equal to or greater than 
P, , then for any lurger value of 02; , the Type I error is greater than P . 

Proor. Let the regions of Fig. 8 be denoted by Ry = A; + Bi + C; with 
similar designations for R, and R;. Let Rg = By + Bo + B+ Be + Ci + Ce. 

If r = Q, let the non-pooling line between R, and R; in Fig. 8 correspond to 
Qo for all 6. Then Pr{R,| 0, 1} = Pr{Ri| 61, 1}, whence Pr{B, + B; + 
Bs + C2| 01, 1} = Pr{A,| 61, 1}. By Lemma 2, we have for any 03; > 62, 
Pr{B. + Bs + Bs + C2| 62, 1} > Pr{Ai| 6:1, 1} and Pr{Ry| 6,1} > 
Pr{R, | 21,1} = Qo. 





YQ, cm 


Fia. 8. Critical Regions for Result 4. 


If r = P., let the non-pooling line at the lower boundary of R; in Fig. 8 
correspond to Qo for all 62: . Then in the same way Pr{ By | 02, 1} = Pr{A: + 
Ao —- A; = C; | O21 , 1} and Pr{B, | B21 , 1} Po Pr{A, Ao Se As | Gs. 1} by 
Lemma 2. Thus Pr{R,| 62, 1} > Pr{Ri + Re + As + Bs| 62, 1} and 
Pr{Rz| 621, 1} > Pr{Ri + Re| 61,1} = Po. 

Resutt 5. For a Class B test, the Type I error is less than P2 for all 62; . 

Proor. Figure 7(b) illustrates the critical region of a Class B test. We have 
Pr{A+ B+C,+ C.+ C3} = P2. But the region of rejection of the “sometimes 
pool” test is smaller, excluding A. 

Resutt 6. The Type I error of a Class B test, for 62, = 1, ts greater than 
(1 — P,)P;. ; 

Proor. Changing P; to P; removes .C, from the region of rejection in Fig. 
7(b), thus decreasing the Type I error. The modified test lies in both Class B 
and Class A, so that Result 2 applies. 

Resutt 7. For any 62, , the Type I error is a minimum for changes of P, when 
P, = P, ° 





S 


PRELIMINARY TEST 555 


Proor. For a Class A test, changing P,; to P, removes region B, of Fig. 7(a), 
thus decreasing the Type I error. For a Class B test, changing P,; to P, removes 
region C2 of Fig. 7(b), similarly decreasing the Type I error. 

Resutt 8. A Class A test, in which the Type I error is less than or equal to P2, is 
more powerful than a “never pool’ test having the same Type I error. 

Proor. In Fig. 8, let region R; = A; + B, + Ci be equal in size to Ry = 
B, - B, a B; B, -t Ci Co. Then Pr{R4| 621, 1} = Pr{R, | Ox, 1} and 
Pr{ Be + Bz; + Bs + C2 | 0,1} = Pr{A1| 01, 1}. Increasing 03. = 1 to 63: and 
applying Lemma 2 yields Pr{R, | O01 ; 630} > Pr{R, | 621 ; O30} ° 

Resutt 9. For a fixed Type I error a Class A test, carried out at given levels of 
P; and P; , 1s more powerful than a Class B test at the same levels. 

Proor. Fig. 7 and Lemma 2 apply at once. 


4.7. Closed form expressions for n; = 2. The probability of rejecting the 
hypothesis in a ‘sometimes pool” test is given by Q(6x1, 0s2) = Qi + Qe where 
Q; corresponds to the region B, and Q, to the region C of Fig. 7. 

The integrals (4) representing the probability of rejecting the null hypothesis, 
reduce, when n; = 2, to 





| us ny 

1 = 

* 0.2 | I(}n2, $1) 

(8) Q = ae f | yo\fomat my? 
1+.6| “442 
* 821 O32 \ = 





where the argument z of the incomplete beta function is defined by z = 2/(1 + x) 
where 


(9) z= 4 Ons : 
te. 
601 O32 
Under the null hypothesis 43. = 1, 
1 + us)'™ 
(10) Q; =I,(}ne, $m) i; Us - Ps, 
| 621 
since 
1 
= Ey 
Similarly 
(11) Q, = ae) 


ot 0\tn2’ 
Us 
1 a 
+ ul 














556 A. E. PAULL 


where the argument 2’ of the incomplete beta function is defined by 2’ = 1/(1+2’) 
where 


(12) f adi 4 @he, 
\ 032} Oo1 
Under the null hypothesis 6;. = 1, 
(13) Q. = I,(3n1 , $n2)-P2, 
since 
l 


Bh dere i, 
ms (1 + wu)?” 
The incomplete beta function has been tabulated by Pearson [4]. 
The author wishes to thank Professor W. G. Cochran and Professor John W. 
Tukey for helpful advice in the preparation of this paper. 


REFERENCES 


{1] T. A. Bancrort. ‘‘On biases in estimation due to the use of preliminary tests of sig- 
nificance’. Annals of Math. Stat., Vol. 15 (1944), pp. 190-204. 

(2] FrepericK Mostetuer. “On pooling data’’., Jour. Am. Stat. Assn., Vol. 43 (1948), 
pp. 231-242. 

[3] M. Merrincton anp C. M. Tuompson. ‘‘Tables of percentage points of the inverted 
beta (F) distribution’’. Biometrika, Vol. 33 (1943), pp. 73-88. 

[4] Karu Pearson, Tables of the Incomplete Beta Function, Cambridge University Press, 


1934. 


ESTIMATING THE MEAN AND VARIANCE OF NORMAL POPULATIONS 
FROM SINGLY TRUNCATED AND DOUBLY TRUNCATED SAMPLES! 


By A. C. ConEn, JR. 
The University of Georgia 


1. Summary. This paper is concerned with the problem of estimating the 
mean and variance of normal populations from singly and doubly truncated 
samples having known truncation points. Maximum likelihood estimating equa- 
tions are derived which, with the aid of standard tables of areas and ordinates 
of the normal frequency function, can be readily solved by simple iterative 
processes. Asymptotic variances and covariances of these estimates are ob- 
tained from the information matrices. Numerical examples are given which 
illustrate the practical application of these results. In Sections 3 to 8 inclusive, 
the following cases of doubly truncated samples are considered: I, number of 
unmeasured observations unknown; II, number of unmeasured observations in 
each ‘tail’ known; and III’, total number of unmeasured observations known, 
but not the number in each ‘tail’. In Section 9, singly truncated samples are 
treated as special cases of I and II above. 


2. Introduction. In practice, truncated samples arise with various types of 
experimental data in which recorded measurements are available over only a 
partial range of the variable. Such samples are usually classified according to 
the form of the population (complete) distribution; according to whether the 
truncation points are known or unknown; and according to whether the number 
of unmeasured (missing) observations is known or unknown. In this paper, the 
further classification of singly truncated or doubly truncated is made, accordingly 
as one or both ‘tails’ of the sample have been removed. Pearson and Lee [1, 2], 
Fisher [3], Hald [4]’, and this writer [5] studied singly truncated normal samples 
with a known truncation point when the number of unmeasured observations is 
unknown. Stevens [6], Cochran [7], and Hald [4] studied similar samples with a 
known number of unmeasured observations. Stevens [6] also considered doubly 
truncated normal samples with known truncation points when the number of 
unmeasured observations in each ‘tail’ is known. In each of these papers, equa- 
tions were derived with which maximum likelihood estimates of the population 
mean and variance can be computed from samples of the type considered. 
With the exception of [5], which uses standard tables of the normal frequency 


1 Based on papers presented before the American Mathematical Society, Durham, 
North Carolina, April 2, 1949, and before a joint meeting of the Institute of Mathematical 
Statistics and the Biometric Society, Chapel Hill, North Carolina, March 18, 1950. 

2 The problem involved in this case was recently called to the writer’s attention by 
Churchill Eisenhart. 

3 Reference [4] appeared while this paper was awaiting publication. Minor revisions have 
been made in view of Hald’s results. 


557 











558 A. C. COHEN, JR. 


function, practical application of the various estimating equations involves 
use of special tables which may frequently be unavailable. 


3. Case I. Number of unmeasured observations unknown. Let x» designate 
the left truncation point, x + R the right truncation point, and hence R the sam- 
ple range. Let m be the number of measured observations with values equal to 
or between the truncation points. In this case, the number of unmeasured obser- 
vations is assumed to be unknown. We translate the origin to the left terminus 
by the change of variable x = x’ — xo, and designate the left and right truncation 
points in standard units of the population (complete distribution) as ¢’ and &”, 
respectively. We can write the probability density function for this case as 


(1) f(x) = es eel 0<2<R, 
where 
(2) Ih = = [ eh ae Iv = tis [ eat 
V 2m Je 2r Jer ' 
and 
(3) p=2X— ot’. 


Thus (Jo — I’; ) is the area under the normal curve between ordinates erected at 
t’ andé” respectively. Moreover (Io — I’) = P(xp < x’ < 2) + R). The likelihood 
function for such a sample is 


Pall 
(4) P(a, 22, ae a. = (Z a I'))ov/ 20 


Since R is the truncated range, and since ~’ and £” are in standard units, 
we have 


(5) f= t+ R/o. 


It should be understood that é’ is considered throughout this paper, as the 
independent parameter of location. The mean, y, cf. (3), is a linear function of ¢’. 

In the derivations which follow, we employ the Fisher J, functions, where 
Iy(é) is defined by (2) and 


no "9 
) —$D(E’+24/0)? 
€ “1 . 


(6) In(@) = [ ” Ty-x(t) dt, 


and hence 
dl, 


— —In-1. 


di 
These functions satisfy the recurrence formula 


(7) (n + 1)In41 + EI, — | = 0, 











TRUNCATED SAMPLES 559 


I,(€) is ordinarily abbreviated to J, in this paper. Where no confusion seems 
likely to occur, similar abbreviations are used for other functions of £. 

We now obtain certain relations for use in subsequent derivations. Equations 
(2), (5), and (6) enable us to write 





als ' aly ” mn lo » OF” 
—>S i ic=>-—— F = , —_ = —-L[_ = —_ = — nae 
(8) oe’ 1 g(é h oe’ I 1 g(é ); Oc a 0a ? 
where ¢(é) is the ordinate of the normal frequency curve; i.e., g(§) = V2 e Pr 
T 


Ordinarily we abbreviate ¢(?’) to ¢’ and g(t’) to g . On differentiating (5) 
we have 


ae” RR 
©) ia 
and hence from (8) 

ay wR 

Oo ? oe 


Taking logarithms of (4), differentiating with the aid of (8) and (9), and 
equating to zero, we obtain the maximum likelihood estimating equations 


aL nly —o')  S/,, , Xi 





ag! 
(10) aL noe” R on im Li 
aa - ( — ws: += 15> Zi ’ + — == (), 
Oo Ibn-lojo« o o 4 o 
If we define 
alt ao gall 
(11) Zi = I, a - ‘ Z, = l, “ - ‘ 


and substitute these values in (10), the estimating equations become 
o[Z, — Z. — #’] —» = 0, 
oll — &(Z, — Z, — t') — ZR/o] — » = 0, 
where v; and v are the first and second sample moments referred to the left 


no 


; . k 
terminus; i.e., », = 7. xi/No . 
1 


(12) 


To obtain the required estimates ¢ and ?’, it is necessary to solve the two 
equations of (12) simultaneously. As illustrated in Section 7, this can be accom- 
plished without too much difficulty with the aid of the normal curve tables by 
using a modified Newton-Raphson method for solving two equations in two 
unknowns. This method is described in greater detail by Whittaker and Robinson 
[8]. Note that Z, and Z., ef. (11), involve only the normal curve ordinates 
gy’ and ¢” and the areas J¢ and J,’ . Consequently they can be evaluated for any 








560 A. C. COHEN, JR. 


desired values of £’ and o from standard tables of the normal frequency function, 
To determine 4, substitute ¢ and ’ in (3). 

Throught this paper, we designate the maximum likelihood estimates as 
i, &¢ and ~’ respectively, whereas corresponding population parameters are 
designated as u, o, and £’. 


4, Case II. Number of unmeasured observations in each ‘tail’ known. Let 
the truncation points, the origin of reference, and the number of measured 
observations be designated as for Case I. If we let n; and nz be the number of 
unmeasured observations in the left and right ‘tails’ respectively, the likelihood 
function for a sample of this type is 


iw: - “ 
(13) Pla, 22, --- »Tarengtn,) = K(1 — I,)"™*- - Jz) e fF tele). (75') " 





where K is a constant. 
We take the logarithms of (13), differentiate with the help of (8) and (9), and 
equate to zero to obtain the maximum likelihood estimating equations 


OL g gy" </, =) 
ae’ my 7 7% -L(¥v+*) = 0, 


Is 
(14) al Y 12 
4 n ‘ Zz 
~ =n: oe a = + 2 3 {rs (¢ + )} = (), 
Oo I o es Co 
Let 
n F ,e! 
(15) Y m © = Ne 


- 
? 
7lo Io 


Jon No (1 = Io) , 
and (14) can be written as 
o[V, = Yo = £| —" = 0, 


(16) ‘ . . 
o({i — #(Y%1 — Yo — #) — YR/o| — » = 0, 

where »; and v2 are again the first and second sample moments referred to the 
left terminus. The estimating equations (16) correspond to equations (12) 
given for Case I, and the manner of solution is the same for both cases. Y; and 
Y2 for » given sample are functions of ¢’ and o only. They can be evaluated for 
any desired values of these variables from ordinary normal curve tables. As in 
Case I, the mean is estimated from (3). 


5. Case III. Total number of unmeasured observations known, but not the 
number in each tail. Again, let the truncation points, the origin of reference, 
and the number of measured observations be designated as in the two previous 
cases. Let N be the total sample size and hence N — ny the combined number of 


AS 
re 


YY Foe CD 


ae ae a 


TS 


TRUNCATED SAMPLES 561 


unmeasured observations in both tails. In the notation of Case II, N — m = 
m + ne. The likelihood function for a sample of this type is 


no no 
) —$ > (§’+2i/0)2 
é 1 . 





(17) Pla,22,°++,2n) =K(l— [9+ ray ( 


o W/ 2 
Taking logarithms of (17), differentiating with the assistance of (8) and (9) and 
equating to zero, we obtain the maximum likelihood estimating equations 


aL ” no 


ar P = e eee as , Li oan 
.~""™ (i - eT) > (3 . a = 


aL oe” \R_ m, 1gsf /,, a 
i ae are Be : +d in(e + =) = 0. 
In this instance, let 


Oo 
_(N—% a saa. _(N—%™ ducal cee 
(19) Q: -( No iF «Lon a. = ( =) —-Ih+Iy’ 


and (18) can be written as 


(18) 


I 
a 

| 
3 
Q. 





o(Qi — Q. — #] — 1 = 0, 
o [1 — #(Q, — Q — #’) — Q:R/o] — » = 0. 


It will be recognized that equations (20) correspond to (12) and (16) for Cases 
I and II respectively. Since the manner of solving the estimating equations is 
identical in all three cases, it will not be discussed further here. For any given 
sample, Q, and Q: are functions of é’ and o only, and they can be evaluated for 
any desired values of these arguments from standard normal curve tables. In 
this case also, the mean is estimated from equation (3). 


(20) 


6. First approximations. 
Case I. In this case, the following relations will usually provide satisfactory 
first approximations for estimating o and ¢’: 


(21) 1=8:, & = —n/sz, 


where s2 is the sample variance; i.e., s; = (v2 — v}). It should be remarked 
that the only penalty involved in beginning with a poor first approximation is 
to increase slightly the number of steps necessary before arriving at a satisfactory 
final approximation by the method of Section 7. 

Case 11. Since n; and nz are known in this case, it is more expedient to read 
first approximations to ¢’ and ~”’ directly from standard tables of normal curve 
areas where we set 


ny , 


1 ’ —t2/2 
(22) m 4 No fe Ne = 1 —Io = / 22 [ é dt, 








562 A. C. COHEN, JR. 


Ne ” 1 P aaall 
23 ————— =] = —— | OF ae 
aad m+tnotnm 9 Vin de” - 
With & and é”’ determined from (22) and (23), we obtain a first approximation 
for estimating o, from equation (5), which we now write as 


(24) a = R/(&’ — £)). 


CasE 111. For a first approximation in this case, it will usually be satisfactory, 
in the absence of contrary information, to assume that the unmeasured observa- 
tions are divided equally between the two tails, and then proceed as in Case IT. 


7. Numerical examples. As previously mentioned, a modified Newton- 
Raphson method for solving two equations in two unknowns is satisfactory in 
each of the three cases considered, for solving the estimating equations to obtain 
é and é’ in practical applications. A random sample from a normal population 
with » = 0, and o = 1, selected from Mahalanobis’s tables [9] will serve to 
illustrate the solution in each case. 

Case I. For the sample selected, nm) = 32; »,= 1.244625; » = 2.105275; 
zo = —1.000000; and R = 2.750000. The estimating equations to be solved 
simultaneously for é’ and ¢ are thus 


o[Z, — Z, — #’] — 1.244625 = 0, 
o [1 — £'(Z, — Z. — &’) — 2.750000 Z./c] — 2.105275 = 0. 


For first approximations, we employ (21) to obtain; 0; = s, = 0.75; and & = 
—1.244625/0.75 = -—1.66. Beginning with these approximations, we subse- 
quently obtain the results displayed in Table 1. 


TABLE 1 
Solution of estimating equations in Case I 


o t’ from », é’ from », Difference 

1.536313 —0.5389 —0.5387 —0.0002 

1.527778 —0.5455 —0.5460 +0.0005 
Interpolating in this table, we obtain 6 = 1.534 and ¢’ = —0.541. On substituting 
these values in (3) we obtain 4 = —0.170. Even though the first approximations 


in this instance proved to be considerably in error, no appreciable increase was 
experienced in the number of steps necessary to arrive at the final values given. 

CasE 11. Solution of estimating equations (16) for this case can also be illus- 
trated with the same sample which was used in Case I. In this instance, however, 





ion 


e- 


TRUNCATED SAMPLES 563 


we have the additional information; m1= 7 and nm. = 1. The equations to be 
solved are: 
o[Y1 — Yo — #’] — 1.244625 = 0, 


o[1 — &(Y¥1 — Ys — #) — 2.750000 Y2/o] — 2.105275 = 0. 


From (22), (23) and (24) we obtain the first approximations: & = —0.935; 
t;/ = 1.960; and hence o,= 0.950. Beginning with these values, we proceed as 
in Case I, and after several trials obtain the results displayed in Table 2. 




















TABLE 2 
Solution of estimating equations in Case II 
o ¢’ from v, €’ from », Difference 
1.041667 —0.9381 —0.9360 —0.0021 
1.000000 —0.9820 —1.0094 +0.0274 
Interpolating, we have ¢ = 1.039 and ?’ = —0.941. From (3) we then obtain. 
p= —0.022. 


CasE 111. Again we use the same sample that was employed to illustrate 
Cases I and II. In this instance, however, we assume that the only information 
available about the unmeasured observations is that their total number is 8. 
In the notation of Section 5, wehaveN = 40, m = 32, and hence N — m = 8. 
The estimating equations in this situation are 


a[Q: — Qo — é’] — 1.244625 = 0, 

o[l — £(Q: — Q: — &’) — 2.750000 Q2/e] — 2.105275 = 0. 
Under the assumption that 4 unmeasured observations are in each ‘tail’, equa- 
tions (22), (23) and (24) give first approximations: & = —1.28; &’ = 1.28; 


and hence o; = 1.074. Starting with these values and proceeding as in the two 
previous cases, we obtain the results displayed in Table 3. 











TABLE 3 
Solution of estimating equations in Case III 
o £’ from », &’ from », Difference 
1.000000 —1.0794 —1.2091 -+0.1297 
1.100000 —1.0118 —0.9739 —0.0379 
By interpolation, we have ¢ = 1.077 and “’ = —1.027. From equation (3), 


we then compute 4 = 0.106. 


8. Precision of estimates. To determine asymptotic variances of ¢ and ’, we 
construct the variance-covariance matrices. This requires that we obtain the 








564 A. C. COHEN, JR. 


second partial derivatives of logarithms of the likelihood function in each of 
the three cases considered. Results stated in (8) and (9) are involved in these 
derivatives. 

CasgE 1. The second partial derivatives in this case are 


~] 


(25) “hed — mo filé’, é””), =. — = fal ’ é””), : - = = fs(é’, = "Sab ; 
e 0&0 do* 








where 


fi?’ #") = -[1 + #Z, — #’Z. — (4 — Z2)'), 
byet sty .. (R ee _ gt a — 
(26) folé', E") = F Z2|(Z1 Z2) £") + [Z; Ze ei}, 
) : 
fies 8”) = {() tate + 0°) - [2-8 - m— 8) - 22. 


Subsequently we obtain 


= P —fi Ar = —fs eset ow fe 
(27) Vié) = | al. ve) Ses it ree = 7s 


Case 11. In this case the second partial derivatives are 








aL , ” 0 i. no ” aL ilo , a 
9 — = g(t ¢& 
(28) og"? no gilé > g be ot’ at! ao , mE; é ); dc? o g3(é ’s ) ? 
where 

g(t’, ¢’’) ee t 4. vy, fae ey, + ~ ¥3 : +: No * yi, 

I 

/ ” No ” 7 ba , | 
(29) molt’, &”) = 4 Vx Yo — 8" | + [¥: — Y2 — ei. 

g(t’, ¢’’) si (*) ve" i No r:) a [2 ae £’(Y, ii Y2 oan é’) alts YeR/o)\. 

\\o Tle J 


Finally we can write 


—H . a ] “is Je 
30 V =” : V c = | ” |, Tce? = ; =, 
(30) "e las - ale ) nolL M93 — 92 . V 195 


Case 111. This time, the second partial derivatives are 





a2 5 
OL no L no 4 


aL ” 0 me t et 
i dco halt’, &), G2 halt’, &”), 


(31) a _ no hy(é’ ’ fh 


¢/2 
Ss 








TRUNCATED SAMPLES 565 


ng’, €”) = -[148@-< - ov*], 
nese) = Baal (ys seemed 
wi.er (oe = 230) 
-|2-2@-a-0-aF]}. 
Accordingly we obtain 
(33) V(é) = ea. vi) - a. hee ee. 


Note that variances of the estimates for each case considered, can be computed 
for given values of ¢’ and o from standard normal tables of areas and ordinates. 


where 








(32) 


= 





9. Singly truncated samples. If only the left ‘tail’ is missing from the samples 
thus far considered, then t” = ~, nm. = 0,9” = 0, In = 0, and hence Z;, Yo, 
and Q. each equal zero. Upon substituting these values in (12), (16), and (20) 
respectively, estimating equations applicable to singly truncated samples are 
obtained as special cases of the estimating equations for doubly truncated 
samples. Of course Cases II and III become identical when samples are singly 
truncated. When Y. = Q. = 0, then Y; = Q,, ef. (15) and (19). 

CasEI. With Z. = 0, the estimating equations (12) become 

g[Z, — £"] = wie 
ofl — &(Z4— #)) = 


Eliminating o between these two equations we have 


;  —— l 
(35) See <r}, 


which is recognized as the Pearson-Lee-Fisher equation in a form which was 
previously given by the author [5]. 
CasE 11. With Y. = 0, the estimating equations (16) become 


o [Yi —_ | = 
o[1 — (Yi: — #)] = 


Eliminating o between the above equations, we obtain 


v2 : l 
- fo ycelnae-*): 


(34) 








(36) 














566 A. C. COHEN, JR. 
which is in a form completely analogous to (35). Furthermore, this equation 


can be solved for £’ in the same manner as (35), ef. [5]. Since o can be eliminated 
between estimating equations in singly truncated cases, but not in doubly 


2 
Vie )=Z— w (n/N known) 


2 
Vie ye w' (n/N unknown) 


WEIGHTING FACTORS 





Fig. 1. Weighting factors for use in determining the variance of ¢. 


truncated cases, the numerical computations are much simpler and less laborious 
for singly truncated samples. 
If the right rather than the left tail is missing from singly truncated samples, 


TRUNCATED SAMPLES 567 


l applicable estimating equations can be obtained from (12) and (16) by translating 
1 the origin to the terminus on the right and setting Z; and Y; equal to zero 
y rather than Z, and Y2. 


100 


10 
7 a 
; V(E}=Lw (n/N known) 
. ae 
S 5 v(e)= tw (n/N unknown) 
oO 
e 4 
2 vig)= w" (estimated from n/N only) 
u 3 
o 
z 
- 2 
=x 
4 
wi 
> 
0.9 
0.8 
0.7 





-3 -2 el 0 ' 2 3 


E 


Fic. 2. Weighting factors for use in determining the variances of £’ and &*. 
ious The variance formulas (25) and (28) likewise assume more simple forms with 
singly truncated samples. Substitute Z. = 0 in (25) and the variance formulas 


oles, applicable with singly truncated samples when the number of unmeasured 








568 A. C. COHEN, JR. 


observations is unknown, become identical in form with those previously given 
by the writer in [5]. When the number of unmeasured observations in a singly 
truncated sample is known, the applicable variance formulas (28), on setting 
Y, = 0, become 


(38) V(@) =<W@) and Ve) == w(t, 


where W and w may be regarded as weighting functions defined by 


1 + Yiu(¥Y, No/N1 + ®) 


69) WE) = BI eM — Oil + Yu(Yime/m + 1 — a — eT 


and 


de 3 ~— #7, — #) 
40) ¥@) = BW, — HI + Kim /m +E — 
Similarly, the correlation between sampling errors of ¢ and £’ in this case becomes 
sop: idemieaiannaciaaeacacmatl NT ia 

V[2— 2(%1 — eI + ¥i(¥ino/m + €))" 

A comparison of the variances (38), with those applicable when the number of 
unmeasured observations is unknown, serves to indicate the extent to which 
information contained in a singly truncated sample is increased by adding 
knowledge of the number of unmeasured observations. To facilitate such com- 
parisons, W, w, and corresponding functions W’ and w’ applicable when the 
number of unmeasured observations is unknown, are displayed graphically in 
Figures 1 and 2. In computing the plotted values of W and w, the ratio n/N 
in (39) and (40) was replaced by Jy . This ratio is, of course, an estimate of I) , 
and for n and N sufficiently large, the substitution is amply justified. Equations 
for W’ and w’ can be found in [5]. For further comparisons, a graph of w’’ ap- 
plicable in determining the variance V (é*), where £* is estimated from n/N alone 
is also included in Figure 2. This latter function is defined as 


(41) 1s,é7 


Io(1 — Io) 
42 w’ P) i eee 2 
(42) (¢*) A 
It follows from the well known formula for the variance of &*: 
(43) Vee) = 5 ee — OO LO — 
$2 n ge 


An examination of Figures 1 and 2 discloses that except when the omitted 
portion of the distribution is small (t’ < —3), the variances of the estimates of 
o and ~’ based on singly truncated normal samples are substantially less when 
the number of unmeasured observations is known than when this information 
is lacking. 
























TRUNCATED SAMPLES 569 


Qn REFERENCES 
y [1] K. Pearson anv A. LEE, ‘On the generalized probable error in multiple normal cor- 
g relation’’, Biometrika, Vol. 6 (1908), pp. 59-68. 


[2] A. Leg, “‘Table of Gaussian ‘tail’ functions when the ‘tail’ is larger than the body”’, 
Biometrika, Vol. 10 (1915), pp. 208-215. 

[3] R. A. Fisner, “Properties and applications of H, functions’, Mathematical Tables, 
Vol. 1, pp. xxvi-xxxv, British Association for the Advancement of Science, 1931. 

[4] A. Hap, ‘Maximum likelihood estimation of the parameters of a normal distribution 

| 


os 


which is truncated at a known point’’, Skandinavisk Aktuarietidskrift, Vol. 32 
(1949), pp. 119-134. 

[5] A. C. Conen, JR., ‘On estimating the mean and standard deviation of truncated normal 
distributions’, Jour. Am. Stat. Assn., Vol. 44 (1949), pp. 518-525. 

[6] W. L. Stevens, ‘‘The truncated normal distribution’’, appendix to ‘‘The Calculation 
of the Time-Mortality Curve” by C. I. Bliss, Annals of Applied Biology, Vol. 24 
(1937), pp. 815-852. 

[7] W. G. Cocuran, ‘‘Use of IBM equipment in an investigation of the truncated normal 
problem”’, Proc. Research Forum, International Business Machines Corp., 1946, 
pp. 40-43. 

[8] E. T. Warrraker ANnp G. Rosinson, The Calculus of Observations, Second Ed., Blackie 
and Son, Ltd., London and Glasgow, 1929, pp. 88-91. 

[9] P. C. Manatanosits, ‘‘Tables of random samples from a normal population’’, Sankhyd, 

Vol. 1 (1934), pp. 289-328. 





THE ASYMPTOTIC PROPERTIES OF ESTIMATES OF THE 
PARAMETERS OF A SINGLE EQUATION IN A COMPLETE 
SYSTEM OF STOCHASTIC EQUATIONS':? 


By T. W. ANDERSON’? AND HERMAN RUBIN‘ 
Columbia University and Institute for Advanced Study 


1. Summary. In a previous paper [2] the authors have given a method for 
estimating the coefficients of a single equation in a complete system of linear 
stochastic equations. In the present paper the consistency of the estimates and 
the asymptotic distributions of the estimates and the test criteria are studied 
under conditions more general than those used in the derivation of these estimates 
and criteria. The point estimates, which can be obtained as maximum likelihood 
estimates under certain assumptions including that of normality of disturbances, 
are consistent even if the disturbances are not normally distributed and (a) some 
predetermined variables are neglected (Theorem 1) or (b) the single equation is 
in a non-linear system with certain properties (Theorem 2). 

Under certain general conditions (normality of the disturbances not being 
required) the estimates are asymptotically normally distributed (Theorems 3 
and 4). The asymptotic covariance matrix is given for several cases. The criteria 
derived in [2] for testing the hypothesis of over-identification have, asymp- 
totically, x*-distributions (Theorem 5). The exact confidence regions developed 
in [2] for the case that all predetermined variables are exogenous (that is, that 
the difference equations are of zero order) are shown to be consistent and to hold 
asymptotically even when this assumption is not true (Theorem 6). 

2. Introduction. The complete system of linear stochastic equations con- 
sidered by the authors in [2] was written 


(2.1) By + Vye2t = G. 


where y; is a row vector of G jointly dependent variables at “time” ¢, z; is a row 
vector of K variables predetermined at ¢, and e; is a row vector of “disturbances,” 
and B,, and I,, are matrices. If B,, is non-singular the distribution of ¢, induces 
the distribution of y; given 2; . 

One component equation of (2.1) was given special treatment. Let 8 be 


1 This paper will be included in Cowles Commission Papers, New Series, No. 36. 

2 The results of this paper were presented to meetings of the Institute of Mathematical 
Statistics at Washington, D. C., April 12, 1946 (Washington Chapter) and at Ithaca, New 
York, August 23, 1946. Most of the research was done at the Cowles Commission for Re- 
search in Economics; the authors are indebted to the members of the Cowles Commission 
staff for many helpful discussions. 

3 Fellow of the John Simon Guggenheim Memorial Foundation; Research Consultant 
of the Cowles Commission for Research in Economics. 

4 National Research Fellow; Research Consultant of the Cowles Commission for Re- 
search in Economics. 


570 





Id 


OW 
>9 
ses 


be 


ical 
lew 
Re- 


‘ion 
ant 


Re- 


ASYMPTOTIC PROPERTIES OF CERTAIN ESTIMATES 571 


composed of the coefficients of the coordinates of y; which are not assumed 
zero in the specified equation, and let x; be composed of the corresponding 
components of y; ; similarly let y be composed of the coefficients of the coordinates 
of z; which are not assumed zero, and u,; the corresponding components of 2; ; 
and let ¢; be the component of «; associated with the specified equation. Then 
the single equation is 


(2.2) Bxe + Yue = be. 
Suppose we have a set of observations z;, z,,¢ = 1, --- , T. For sets of any 
two vectors a; and b,, let the second-order moment matrix be 
ic, 
(2.3) Me = = 2 ab. 
T t=1 


Let s, be some linear transform of v; , the set of coordinates of 2; not contained in 
us, chosen so M,, = 0. Defining 


(2.4) Wee = Miz — Ma Mi Mes, 


and assuming e; normally distributed with mean 0, covariance matrix 2, and 


independently of e,-(t # ¢’), we find 8, the maximum likelihood estimate of 8, 
to be proportional to a vector defined by 


(2.5) (Mz:MzeMiz — vW22)b’ = 0, 
taking v as the smallest root of 

(2.6) | MaeMieMee — vWez| = 0. 
The vector is normalized by 

(2.7) 66,.8’ = 1, 


where ®,, may be a function of the estimates of other parameters. The estimate 
of y is? = —$M..M7}, [2; Theorem 1]. These estimates were derived under the 
following explicit Assumptions A, B, C, and D: 

AssuMPTION A. The selected structural equation (2.2) is one equation of a complete 
linear system of stochastic equations. It is identified by the fact that if H is the 
number of coordinates in x; , there are at least H — 1 coordinates in v; , the vector of 
predetermined variables in the system, but missing in (2.2). 

ASSUMPTION B. At time t all of the coordinates of 2; = (ut, v2) are given. 

AssumpTIon C. The coordinates of 2; are given functions of exogenous variables 
and of coordinates of Yt+, Yi-2, °° + . If coordinates of yo , y-1, «++ are involved in 
2, they will be considered as given numbers. The moment matrix M,, is non-singular 
with probability one. 

AssumPTION D. The disturbance vectors «, are distributed serially independently 
and normally with mean zero and covariance matrix Dzz. 

Under these assumptions it is found that (1 + v)~” is the likelihood ratio 








572 T. W. ANDERSON AND HERMAN RUBIN 


criterion for testing the hypothesis that the number of components of z; assumed 
to have zero coefficients is so great. 

If there are no lagged endogenous variables in z;, we can find confidence 
regions for 8 and for 8 and y simultaneously as well as an approximate test for 
the above hypothesis. The assumptions used for these results are A, B, and 

AssumPTION E. All the coordinates of z; = (uz, V4) are exogenous. The moment 
matrix M,, is non-singular. The disturbances of the selected equation are distributed 
independently and normally with mean zero and variance o’. 

Assumptions A and B are used in this paper and a number in addition, 
which will be lettered similarly. It is to be emphasized that the various assump- 
tions are used alternatively, never all at once; in fact many assumptions are 
mutually exclusive. 

3. Consistency of the estimates. The estimates 8 and 4 are consistent not 
only in the case for which they are maximum likelihood estimates, but also in 
cases in which the disturbances are not normally or even identically distributed. 
Moreover, for consistency of the estimates it is not necessary that the investigator 
know all of the components of v; or use them. Another direction in which the 
assumptions may be relaxed is to permit the other equations in the system to be 
non-linear. 

3.1. The linear case. This case is characterized by Assumption A. We need 
also to assume: 

ASSUMPTION F. M,, converges to a fixed non-singular limit R in probability. 

Let u; consist of the part of z; that enters the selected structural equation (22). 
The remainder of the components of z, are divided into two groups as to whether 
they are known or not. Let c, be a linear transform of the known components 
not entering the specified equation such that 
(3.1) plim M,. = 0, 


| nd) 


and let r; be a linear transform of the components of z; not known such that 


(3.2) plim Mu, = 0, 
t—2 

(3.3) plim M, = 0. 
to 


The relevant part of the “reduced form,” obtained from (2.1) by multiplication 
by B;, is 
(3.4) at = Tut + Weece + Wert + 8. 


The matrix (Il.-I.z,) is Iz, (defined in [2]) multiplied on the right by a non- 
singular matrix; hence, 6II,- = 0, and similarly SII, = y. We shall find it 
convenient to assume 

ASSUMPTION G. IIz- has rank H — 1. 
This means that for T sufficiently large the probability is arbitrarily near 1 
that (2.2) is identified. 


\- 
it 


ASYMPTOTIC PROPERTIES OF CERTAIN ESTIMATES 573 


However, these conditions still do not insure consistency. We need the asymp- 
totic analogue of lack of correlation: 
ASSUMPTION H. 


plim | ly 5421 = 0. 
i) t=1 
We do not need to require that the covariance matrices of 5; are the same or 
even that they exist. We shall make an assumption about 


Mu. Muc\ (Mus 
(3.5) Wee = Maz — (Mau M 2) ; 
M ex Me Mz 


AssumPTION I. The ratio of the largest to the smallest characteristic roots of Wee 
is bounded in probability. 
This means that for a suitable constant K 


UW.) ) = 
(3.6) lim P (: W.) >K)=0, 


where P(E£) denotes the probability of event # and s(A) and /(A) are the smallest 
and largest roots of the matrix A, respectively. 

Assumptions F and H imply that Pi, — Th. and Pre — Tee in ee 
where P,,, = Mz,M7', and P,, is the part of 


Mis M.u\~* 
(3.7) (Meu Me) 
Mau Mee 


corresponding to the vector® c,. The first assertion follows because Mz,M c= 
(TeuMuu + WeeMoc + WerMr + Ms.)Mii and M,. 0, Mr — 0, andMs, — 0 
in probability by (3.1), (3.3) and Assumption H; the second assertion follows 
similarly. Since matrix multiplication is continuous, and the characteristic roots 
of a matrix are continuous functions of the matrix,® 


(3.8) plim s[PzcM.sP2] = 0, 
T-0 





where My, = (Mee — M.uM7zi.M..). This follows from the well-known theorem 
(a proof of which is given in [4]) that if a random vector X,7 converges sto- 
chastically to X, then f(X r) converges stochastically to f(X) if f(y) is continuous 
at X. 

We shall find the following lemmas convenient. The proofs are simple and 
have been given in [1]. 

Lemma 1. Let B be positive definite, A positive semi-definite. Then the smallest 
root v of | A — xB | = 0 is less than or equal to s(A)/s(B). 


5 See Section 4 of [2]. 


6 Because of the assertion above and Assumptions F and G only one characteristic root 
of the matrix approaches zero in probability. 








574 T. W. ANDERSON AND HERMAN RUBIN 


LemMA 2. Each element of a positive definite matrix is less in absolute value 
than the largest characteristic root. 
Let v be the smallest root of 


(3.9) | PecMssP2e — vWez| = 0. 
Then plim vWi2 = 0. This statement follows from (3.8) and Lemmas 1 and 2. 
T—0 
Since 0 is a simple characteristic root of II, plim M wlIz¢ , it follows from (3.9) 
T—0o 


and the consistency of P,, and P,, that 8 approaches 8 apart from normalization. 
The following theorem results directly: 


THEOREM 1. Under Assumptions A, F, G, H, and I, and if plim 64,28’ = 1, 
T—0 


(3.10) plim ~ = 8, 
To 

(3.11) plim 7 = 7, 
T-90 


where B and ¥ are calculated as if r, = 0 and as if the remainder of A, B, C, and D 
were satisfied.’ 

3.2. The non-linear case. In this section we apply the estimates obtained in [2] 
to an equation of a complete system in which the remaining equations may be 
non-linear. We replace Assumption A by the following assumption: 

AssuMPTION J. The selected structural equation (2.2) is one equation of a complete 
system of stochastic equations: 


(3.11) Fi(ye, 2) = €t (¢ =1,---,G). 
Let us solve the complete system (3.11) for the components of y;. We obtain 

(3.12) Yt3 = hj(2e, €). 

Let u, be the subvector of z; occurring in the selected structural equation. 

Let c; be a vector function of z; such that plim M.. = 0. We may write (3.12) 

for those y’s occurring in the selected structural equation as 

(3.13) xv; = Tew: + Uece: + 9'(2e, €1), 


where the components of ¢(z:, €:) are the residuals from the formal limiting 
regression of x; on u; and c,. The proof of Theorem 1 can be used to prove the 
following: 

THEOREM 2. If Assumptions F, G, H, I, and J are satisfied with z, replaced by 


(ue, Cz) and 6, replaced by oz: , €:), and r, = 0, and zf plim 8%,,8' = 1, then 
T-0 


(3.14) plim @ = 8, 
To 

(3.15) plim 47 = y. 
a T—0 


7? This follows from the above statements because 6 and ¥ are (vector-valued) rational 
functions of M,, , Pz, , Wtz and #,; which approach limits in probability. 


1e 


by 


ial 


ASYMPTOTIC PROPERTIES OF CERTAIN ESTIMATES 575 


4. The asymptotic distribution of the estimates. 

4.1. The asymptotic distribution of Pz, and P,,. To obtain the asymptotic 
distribution of the estimates we need stronger assumptions. Throughout Sections 
4.1 and 4.2 we use Assumptions A, B, F, H, I, and the following: 

AssuMPTION K. The exogenous variables are bounded; the vector of disturbances 
of the complete system has mean zero, and is serially independent; for some » > 0 
and some M, &(| 5: |***) < M; the coordinates of z, may be linear combinations of 
lagged endogenous variables. If the endogenous part of a coordinate is 


0 G 


Za Za JriYt—r,i, 


T=1 i= 


then 
oo G 


De algal << @ 


t=1 i=l] 


and 
} G 


Zz Z ri Yt—r,i 


t=t i= 


ts bounded. 
AssuMPTION L. The matrix ®,, is known and constant. 
AssumPTION M. For eachi,j,k,l,1<i,j7 < Hl <k,l< K, 
? 


; 1 7 
lim = & (605625 2 21) = Kijkl 
T—2 Zr t=1 


exists. 

Let the components of M,,, M,., Mz be arranged as a vector m(T) with 
mean value u(7'). It has been shown [3] that »/7'(m(T) — u(T)) is asymptotically 
distributed according to N(O, 2), the normal distribution with mean 0 and 
covariance matrix = composed of elements 


oj = lim &(T[m(T) — u(T)] (m(T) — »(T))). 


In conjunction with this result we make repeated use of a special case of Theorem 
6 of [4]: 


Suppose »~/T(x;7 — £7) (j = 1, --: , n) have the joint asymptotic distribution 


N(0, VY) with &;7 being functions of T such that lim £;7 = &; . Let fir(zi, ++ , 2n) 
T—2 
be random Borel-measurable functions of n real variables such that alee = axjr(z) 
i 


exists with probability one for T sufficiently large and z in a fixed neighbor- 
hood of £, and suppose that there exist numbers a; such that for any « > 0, 


and x > 0, P( sup | ox ;r(z) — ax; | > €) approaches zero. Then if 
(2) (2—-Er)’ S$ (0/7) 


Yer = Ser(Qir 5 *°*,Xnr) and ner = ferlfir , +++ , Enr), the random variables 
V/T (yer — tr) have the joint asymptotic distribution N(0, AVA’), where A = 
(a;;). 








576 T. W. ANDERSON AND HERMAN RUBIN 


To obtain the asymptotic distributions we have only to verify that the assump- 
tions of this statement are satisfied, and compute A, since the asymptotic 
distribution is characterized completely by AVA’. We shall denote the element in 
the k-th row and /-th column of AVA’ by o(f: , f:). We shall find it convenient 
to use the notation df = Adz; that is, the differential df is defined in terms of the 
limit matrix A. 


Let 
(4.1) A=Mu, 
(4.2) B=Ma ? 
ed) 
(4.4) E =>plim M,, , 
T-—»0 
(4.5) L = Pau ) 
(4.6) P=P,= MaMa, 
(4.7) A= In, 
(4.8) Il = Il. 


The matrix] L is the random function AM7z., + UeeMauM7z. + A of A, P is the 
random function BM; + I of B. Then 


(4.9) dL = (dA)C", 

(4.10) dP = (dB)E™. 

However 

(4.11) o(Aee , O51) = Oj, 
(4.12) o(a% , bj) = Bex, 
(4.13) o(bix , bj) = Viger , 


where a; jx1, Bijkt, Vier are the appropriate quantities kaa , respectively. From 
these we may compute oa(I;; , liz), o(li; , per), and o(p:; , Per), the elements of the 
asymptotic covariance matrix of the elements of L and P (which are asymp- 
totically normally distributed by the above). These elements can be estimated 
consistently from the sample (the proof follows from Theorem 1). 

4.2. The asymptotic distribution of 8 and 4 for constant normalization. In this 
section we shall show that 8 and 4 are asymptotically normally distributed 
(Theorem 3). In view of the above theorem on asymptotic distributions the 
intricate part of the proof is in obtaining the covariance matrix. First we shall 
demonstrate that the elements of yW are 0(1/+/T) in probability. Since Assump- 
tion I holds, it is sufficient to show that s(Pz.:M..P:) iso(1/+/T) in probability. 
This means d | Pz.M,,P:, | = 0, since each of the characteristic roots of 
P..M. P2, except the smallest approaches a non-zero limit in probability. 


ASYMPTOTIC PROPERTIES OF CERTAIN ESTIMATES 577 


For any matrix A, A;; denotes the matrix obtained by deleting the 7-th row 
and j-th column from A, and Aix,;; is the matrix obtained by deleting the 7-th 
and k-th rows and the j-th and /-th columns. Let 


AY = (-1)*| Ass |, 

ANKE = re | Ak, it l, 
where « = 0 if (¢ — k) (j — 1) > 0, 1 otherwise when i ¥ k, j #1. A’ = 0 
ifi = k orj = l. In the rest of the paper we use the summation convention of 


tensor calculus for lower case indices; namely, that whenever a lower case letter 


appears as a superscript and a subscript in an expression, the corresponding 
terms are to be summed on that index. 
In general 


(4.14) d|A|= A‘da;;. 
We may consider P,,M «Pz: as a random function of Pz, . Then 


(4.15) d(i,j-th element of PzsMsPs:) = iex: dp} + wiendpi . 
However 


(4.16) (IzETz.) = p’e° = p's’, 
where p‘ is a factor of proportionality. Since 6II,, = 0, we haved | P.sM,, P;,| = 0. 
Then it can be shown that d(fl.Muflz, — PssM.Pz.) = 0, where fl. = 


(1 an wa?) Pa . 
BW.2(" 


Let © = I,.EM:, and F = P..MseP iz . We know that 8: = 0", where 
py = 1/p” (and the capital letter J indicates that there is not to be a sum on 
that index), and 6 = fi,,M..[i:, . Hence 


(4.17) dpi = p,dd” + 0 dp,. 
However §'3%g;; = 1; therefore 6, = (66*"y,,) +. From this it follows that 
(4.18) dps = —(8s)0"'pn d6". 
From (4.14) we see d6*” = 09*”'“"d6,; . Therefore 
(4.19) dg* = plo" — B'B'O' px ild 65. 
Let us define ¥; = 6'¢;;. Let us multiply (4.19) by 6,; and y,;. We obtain 
6,:d8° = ps0,0°'dbas 
= psdy 0" dbas — ps0”"'dbyg = —B"dbya, 
y.de* = 0. 
Let us simplify (4.20). We see that 


(4.20) 


(4.22) B°db,. = Bexie..dps, . 








578 T. W. ANDERSON AND HERMAN RUBIN 





Hence 


(4 23) o(6°d6,. ? Bd 6) aa B°n'y €x1B'm y Cnje "Yami 


_ Q%& on _m_ ¢ ni 
= BB Ty Yapmi = Tiy, 


say. Let o(8*, 8’) = qi’, and let Q; = (qi). Then from (4.20) and (4.23) we obtain 


(4.24) 00.0 = Rh, 

and (4.21) is 

(4.25) ¥Q, = 0. 

It may be shown (see [1], for example) that the solution is 
(4.26) Qi = (I — BY) .x(On)*(Ridex(Oue) “I — ¥'B)e. 5 


where k(1 < k < H) is arbitrary except that 6° ~ 0, and A. denotes A with 
the k-th column deleted, etc. If the normalization is 8° = 1, k = 7 is a convenient 
choice. 

Since 4 = —£L, 


(4.27) 
Hence 
(4.28) o(8’,4") = —0(8’, BN? — o(8’, MB’, 

(4.29) o(f”, 4") = o( 8, BY)ATAF + 0(8’, BAF + 08’, U7)B'AP + o(l?, 17) 8'6’. 


We, therefore, see that we must compute o(@’, [7)8° and (I, 17)3°3’. We find, 
from (4.20), (4.21), and (4.22) that 


dy” = —dé'n? — Bid? . 























(4.30) 0, 8'0(8', GG) = —6'B’ric”’Biipk = 12>, 

say. Let (o(8’, [7)8") = Q., and let R. = (rZ,). Then, from (4.30) and (4.21) we 
obtain 

(4.31) 6Q2 = R2, 

(4.32) yQ. = 0. 

The solution is 

(4.33) Qe = (I — 6p) ..(On) (Re)e. « 

We find, readily, that 

(4.34) B'B’o(I7, 17) = B°B’c”’c™ ain = G3", 








say, where (c””) = C™*. Let Q; = (q3”). This concludes the proof of Theorem 3. 

THEOREM 3. Jf Assumptions A, B, F, H, I, K, L, and M are satsfied, /T(8 —8) 
and \/T(4 — vy) are asymptotically jointly normally distributed with means zero 
and covariance matrix 


(4.35) o(8’,8) =, 




















ASYMPTOTIC PROPERTIES OF CERTAIN ESTIMATES 


(4.36) o(8’, 4) — QoTeu = Qe ; 
(4.37) o(4’, 4) = TwQitie. + TQ. + Qi + Qs, 


where Q,; is given by (4.26), Qe by (4.33), and Q3 by (4.34). 
If there is a kind of asymptotic independence of ¢; and z;, then the above 


expressions may be simplified. Corollary 1 results from Theorem 3 and the 
following assumption: 


le ’ 
ASSUMPTION N. lim T 7 &(¢iz:2:) = o R, where R is defined in Assumption F. 
T—2 t=1 


Coro.uary 1. If Assumptions A, B, F, H, I, K, L, M, and N are satisfied, 
/T(8 — 8) and~V/T — vy) are asymptotically jointly normally distributed with 
means zero and covariance matrix 


(4.38) o(8’, B) = o (I — Bp) x(n) (I — WB)., 
(4.39) o(8’, 4) = —o°(I — B'p).2(Ox) (Hew + Wy )e. ; 
(4.40) go 7, 4) Pe o*[(il.. + y'v).x(Oxx) (Ten + W'y)k- + o.. 

4.3. Asymptotic distribution of the estimates of the parameters B and y with 
normalization a function of Qzz . 

If we relax Assumption L that #,, is constant, we obtain a more general 
result. Since the proof, however, is more involved, we shall not give it here; 
the reader is referred to [1]. In the derivation of the estimates 2,, was defined as 


&(8,5,). In the asymptotic theory we do not assume that this is the same for 
each t. We use the following assumption: 


= 
ASSUMPTION O. lim T a (615515 51x Zu) = Nijx exists; 
T—0 t=1 


T 
lim Z 7. &(6155:5) = i; exists; 


T—-2 T t=1 


. 1l< , cet ie 
lim T » G (612613580) = Bijnr + Gi Gyr exists. 
T—0 t= 


Let 6;jx. be the quantities n;;,; corresponding to the wu’s, €:j:, the quantities 
corresponding to the c’s. Define 


(4.41) x‘ = yet at 


dw.’ 
(4.42) tay = Beryx esi, 
(4.43) a = (I — BV)«On) (rae. , 
(4.44) qs = x? x Dijet, 
(4.45) Ge = x'B"8ijmic™. 
With the aid of the matrices Q, , Q2, and Q;, the vectors q and ge. , and the 





580 T. W. ANDERSON AND HERMAN RUBIN 


scalar g; , we may express the asymptotic covariance matrix of the estimates. 
We obtain 

THroreM 4. Jf Assumptions A, B, F, H, I, K, M, and O are satisfied, and 
@,, is a function of Q22,+~/T(8 — 8) and ~/T(4 — vy) are asymptotically jointly 
normally distributed with means zero and covariance matrix 

Qi + 948 + B’qs + 58'B, 
—Q, Tew + qs = 8’ qalleu + qsB'y = Q> = B'ds ’ 
= Tie. QiTew or Thou qev oa y'qalleu + qsv'Y 
+ i1...Q» + Qeiley — v'% - ay + Q; , 
where Q: , Q2 , Qs, 91, 95, and gs are given by (4.26), (4.33), (4.34), (4.48), (4.44), 
and (4.45) respectively. 

Corotitary 2. If Assumptions A, B, D, F, H and K are satisfied, and 
@., = 22, VT(B — 8) andVWT4 — vy) are asymptotically jointly normally 
distributed with means zero and covariance matrix 
(4.49) 0B’, 8) = (I — B'Y).n(Oux) “(UT — WB). + 46°, 

(4.50) o(8’, 4) —(I iil Bp) (Orn) (Ten + VY )k- + 38'Y, 
(4.51) a(7’,47) = (Tes + yb) .1(Oxz) (ew + Wye. + CC + 377. 


5. Asymptotic distribution of the likelihood ratio criterion and the small 
sample criterion for testing a certain hypothesis. The likelihood ratio criterion 
for testing the hypothesis that the number of coordinates of z,; with zero co- 
efficients in the selected structural equation is as great as it is assumed to be is 
(1 + py) *” (2, Theorem 2], where v is the smallest root of 


(5.1) | PollaPs. =— vW zz | = 0. 
Then 


BP os Mae Ps mA ma 
Tv = z gos ge B — (W TBP =) ee 3 (WV TBP 2s)’. 


From Theorem 5 of [4] it follows that the asymptotic distribution of Tv is the 
same as that of the quadratic form x 2 x’, where x has the limiting distribution 


of ~/ TBP. , use being made of plim 8W,28’ = o°. We have 


T—20 
(5.3) dx’ = B’dp} + dB’x} . 
Let T = (I — B’p) (On) UW — WB)... Then 

(5.4) dp? = —v" 8 aeemndp? ‘ 
Substituting in (5.3), we obtain 


(5.5) dx = B'dp} — v"B’Temndpi mr . 





ASYMPTOTIC PROPERTIES OF CERTAIN ESTIMATES 


Then 
(5.6) o(a’, 2’) = o (c” — apni") = ot" 
say, and (¢”) = =. 

Let F be chosen so EK = FF’ and F’=F = W is diagonal. Since HZEEZE = EZE, 
the diagonal elements of VY are 1 and 0. The number of elements that are 1 is 


the rank of EHZE, namely, D — H + 1, where D is the number of coordinates 
of v; (the number of coordinates whose coefficients in the selected equation are 


assumed to be zero). Let z = lor. Then the asymptotic distribution of Tv 
oC 


is the distribution of zz’ where z is normally distributed with mean zero and 
covariance matrix WV. It is the x’-distribution with D — H + 1 degrees of freedom. 
We observe that T log (1 + v) and TD) are asymptotically equal to T'v, where 
is the criterion based on small sample theory [2, Theorem 4]. Finally, we note 
that v is independent of the normalization of 8. 

THEOREM 5. If Assumptions A, B, F, H, 1, K, M, and N are satisfied, —2 times 
the logarithm of the likelihood ratio criterion, —T/2 log (1 + v), the asymptotically 
equivalent Tv and TD times the small sample criterion, X, for testing the hypothesis 
that the number of coordinates with zero coefficients is D are asymptotically distributed 
as x’ with D — H + 1 degrees of freedom. 

This theorem indicates how conservative the small sample test is asymp- 
totically, for that test asymptotically is equivalent to using Tv as having an 
asymptotic x’-distribution with D degrees of freedom. 

6. Asymptotic behavior of confidence regions based on small sample theory. 
In [2] we deduced confidence regions for 8 and for 8 and y when Assumption E 
holds. If the normalization of 8 is 


(6.1) 6®,,8’ = 1, 


where ®,, is a given matrix, then a confidence region (a) for 8 of confidence « 
consists of all 8* satisfying (6.1) and 

6*M..M.. M.28* D 

ee. « ” 
(6.2) 3*W..8* =T_K Fp,r x(e), 
where [’p,7-x(e) is chosen so the probability of (6.2) for 6* = 8 is e and K is 
the number of coordinates of z,; and D is the number of coordinates of v,. A 
region (b) for 8 and y simultaneously consists of 8* and y* satisfying (6.1) and 


B*MaMyMuz8™ + B*May® + 7*MucB™ + y*Muuy® +8*M is Moe Mi 28*" 
B* Wee B*’ 
(6.3) , 
K 
Sp K Pat. 
We shall now show that even if Assumption E does not hold the regions have 
asymptotically confidence coefficients « and they are consistent under general 
conditions. 





582 T. W. ANDERSON AND HERMAN RUBIN 


Let c= BM.Mv. + vy, € = BM,,M;. . We observe from Section 4 that if 
Assumptions A, B, F, H, K, L, M and N are satisfied, the vectors+~/Tc and 
4/Te have asymptotic independent distributions N(0, oC) and N(O, o’ E™’), 
respectively. Then TcM,,.c’/o’ and TeM,,e’/o’ will have asymptotic independent 
x’-distributions with F(= K — D) and D degrees of freedom, respectively. 
Also 8W;28’ approaches o’ stochastically. By Theorems 5 and 6 of [4], the left- 
hand sides of (6.2) and (6.3) have asymptotic F-distributions with D and T -- K 
degrees of freedom and K and T' — K degrees of freedom, respectively. 

We shall prove that (a) is consistent for 8; the proof is similar for (b) as a 
region for Bandy. If we replace 8 by b in the definition of e,eM,..e’ =bMz,M.. M.2b’. 
For b ¥ 8 we must show that the probability that b will fall in the confidence 
region for 8 approaches zero. The above form approaches bIl,,EM;,b’ in proba- 
bility. If b + 8 and satisfies (6.1) then bII,, ¥ 0 and eM,,e’ has a non-zero limit 
in probability since £ is positive definite. Thus b is not in the limiting confidence 
region. 

THEOREM 6. If Assumptions A, B, F, H, I, K, M, and N are satisfied, the 
confidence regions of Theorem 3 of [2] (including (a) and (b) above) are consistent, 
and the regions (a) and (b) have asymptotically the confidence levels e. 


REFERENCES 


[1] T. W. ANDERSON AND HERMAN RuBin, “Estimation of the parameters of a single sto- 
chastic difference equation in a complete system,’’ Cowles Commission for Research 
in Economics, 1947, dittoed. 

[2] T. W. ANDERSON AND HERMAN RvBIn, ‘Estimation of the parameters of a single equa- 
tion in a complete system of stochastic equations,’’ Annals of Math. Stat., Vol. 20 
(1949), pp. 46-63. 

[3] H. Rusrn, ‘‘Consistency and asymptotic normality in stable linear stochastic difference 
systems,’’ to be published. 

[4] H. Rusrn, ‘Topological properties of measures on topological spaces,’’? Duke Math. 
Journ., to be published. 





SOME NONPARAMETRIC TESTS OF WHETHER THE LARGEST 
OBSERVATIONS OF A SET ARE TOO LARGE 
OR TOO SMALL 


By Joun E. WAtsH 


The Rand Corporation 


1. Summary. Let us consider a large number n of observations which are statis- 
tically independent and drawn from continuous symmetrical populations. This 
paper presents some nonparametric tests of whether the r largest observations 
of the set are too large to be consistent with the hypothesis that these populations 
have a common median value. Tests of whether the r largest observations are 
too small to be consistent with this hypothesis are also considered. Here r is a 
given integer which is independent of n. 

Subject to some weak restrictions, it is shown that the significance level of a 
test of the type presented tends to a value a as n increases. For no admissible 
value of n, however, does the significance level of this test exceed 2a. If whether 
the largest observations are too large is considered, tests with values of a suitable 
for significance levels can be obtained for r > 4. Values of a suitable for sig- 
nificance levels can be obtained for any value of r if whether the largest observa- 
tions are too small is investigated (n large). 

Properties of the power functions of these tests are considered for the special 
case in which the r largest observations are from populations with common 
median 6, the remaining observations are from populations with common 
median ¢, and each population has the property that the distribution of the 
quantity 

(sample value) — (population median) 


is independent of the value of the population median. For tests of 6 > ¢, the 
power function tends to zero as 6 — ¢ — — © and to unity as 6 — ¢ >. For 
tests of ¢ > 6, the power function tends to unity as @ — @¢ — — © and to zero 
asé@—gd— ~, 

Analogous tests of whether the smallest observations of a set are too small or 
too large can be obtained from the tests of the largest observations by symmetry 
considerations. 

If there is strong reason to believe that the set of observations is a random 
sample from a continuous population, the tests presented in this paper can be 
used to decide whether the population is symmetrical. Tests of this nature are 


sensitive to symmetry in the tails of the population but not to symmetry in the 
central part. 


2. Introduction and statement of tests. The tests derived in this paper are 
applicable to situations of the following two types: 


(a). It is known that the observations are independent and from continuous 
583 








584 JOHN E. WALSH 


symmetrical populations (i.e., each population has a continuous cdf F(x) 
such that F(x — ¢) = 1 — F@ — 2), where ¢ is the population median). 
It is desired to test whether the largest few observations are too large 
(or too small) to be consistent with the assumption that the populations 
have a common median value (if the 50% point of a continuous sym- 
metrical population is not unique, the median of this population is de- 
jined to be the midpoint of the interval of 50% points). 

(b). It is known that the observations are independent and from continuous 
populations with a common median value (e.g., the observations may 
be a sample from a continuous population). It is desired to test whether 
these populations are symmetrical (with emphasis on the tails of the 
population). 

With respect to (a), perhaps the most common practical application is that 
where the observations are assumed to be a sample from a continuous sym- 
metrical population of some special type (e.g., normal) but the values of the 
largest few observations make this assumption questionable. The nonparametric 
tests presented for (a) are easily applied and a significant result for a non- 
parametric test automatically implies that the observations are not a sample 
from the specified type of population. Furthermore, if a parametric test of this 
situation (i.e., a test based on the assumption of a sample from this special type 
of population) is significant, the nonparametric tests are useful in determining 
whether it is possible that the observations might be a sample from a continuous 
symmetrical population of some other type. 

With respect to (b), perhaps the most common application is that where the 
set of observations can be considered to be a sample from a continuous population 
and it is desired to test whether this population is symmetrical in the tails. 

Now let us consider the forms of the tests. Let x(1), --- , x(n) represent the 
values of the n observations arranged in increasing order of magnitude. Then 
s(n +1-—r), e(n + 2 — r), ---, x(n) are the r largest observations of the 
set. For situations of type (a), the tests of whether the r largest observations 
are too large are of the form 

Test 1. Accept that the r largest observations are too large to be consistent with 
the hypothesis that the populations have a common median if 


min [x(n +1—%) +2G);1< ks < r] > 22(W,), 
where the 2’s, j’s and n are integers such that 
1 =f, a. wm Su XK Dost s ji <Wex<n+1-r, 
ais defined by 
a = Pr{min [x(n + 1 — 7%) + x(jx)] > 26|¢ = common median}, 
and W, = W,(n) is the smallest integer satisfying the relation 


(1) Pr{x(Wa) < @|¢ = common median] < a. 








SOME NONPARAMETRIC TESTS 585 


In testing the hypothesis of Test 1, the principle followed is to choose 
a(n + 1 — r) and some subset of x(n + 2 — r), --- , x(n) for use in the test. 
The integer s represents the total number of order statistics selected from 
a(n+1-—r),--:, x(n). 

The value of a = a(t, +++, %s3j1, °** » je) is independent of nm and is given 
by equation (4) in Section 3. Table 1 contains some values of the 7’s, j’s and s 
which yield values of a suitable for significance levels. For Test 1, values of a 
suitable for significance levels can be obtained for r > 4. 



























































TABLE 1 
Some values of a fors < 5 

a $ | i | io | is | is is a S| k | js | is 
0625 1 4 | i| | | 
.0312 1 | 5 1 | | 
.0156 | l | 6 1 
.0078 1 | 7 1 
.0039 1 | 8 | 1 | 
0352 1 | 7 | 2 | | 
0195 1 | 8 | 2 | | 
.0107 1 | 9 | 2 

serine fasesennsanl entailed cesiashdtadeignii tidal ilieigaie iii 
.0469 2 14 | 5 1 | 2 
0234 2 5 | 6 1 | 2 
.0117 2 | 6 | 7 } 1 | 2 
0059 | 2 | 718 1 | 2 

esnsresiseritniisstnetsee iene pment en ii tae ine ine see apse nies eaiiemeititanataes t taemensniiemasiats aapaiciniistan 
0391 3 | 4|5 | 6 | 1| 2/3 
.0195 | 3 5 | 6 | 7 | 1|/ 2/3] | 
.0098 | 3 6| 7] 8 | 1) 2/3) | 
0459 4 |}4|s5|6|7 1}2]3|4| 
0229 4 15 |6|71|8 1}/2/3 1/4] 
0115 4 6|7/8s {9 1|/2|3 | 4 

ae | cece aang sdemipeinnspinenensad enemies poeeneatens saahneemprimenn fkseumsennaennese emanates paseseensete 

0308 5 '415/ie6tl/7]8slil1}/2/3i4is5 
.0154 5 }5/6)/7;/8] 9] 1}/2)3/ 4) 5 
.0077 5 6 | 7 8,9] 0)1)2)3 > 4) 5 








If the n independent observations satisfy the additional conditions 
(i). Asymptotically (n—), 2(W.) is statistically independent of min 
[c(n +1 —%) + 2QGr);1< k < 8). 
(ii). The standard deviations of x(W.) and min [a(n + 1 — %&) + 2x(jx); 
(A) 1<k < s| exist for alln > 2, + 7, — 1 and the limiting ratio (n — «) 
of these standard deviations is either zero or infinite. 
(iii). Let the notation o(z) denote the standard deviation of z. Then, if the 
populations have a common median ¢, asymptotically the cdf’s of 










JOHN E. WALSH 


[e(W.) — d]/ole(W.)] and {min [x(n + 1 — a) + 2(je)] — 26}/ 
o{min [x(n + 1 — %) + x(jx)]} are continuous at the point zero. 
then the significance level of Test 1 approaches the value a as n tends to infinity. 

Although conditions (A) may appear to be complicated, they are not very 
restrictive. These conditions are satisfied if the n observations are a sample 
from a continuous population of the type usually encountered in practical 
situations (i.e., approximated in practical situations). Perhaps the most well 
known type of continuous symmetrical population for which a sample does not 
satisfy conditions (A) is that with a triangular probability density function. 
Part (ii) of conditions (A) is not satisfied for a sample from a population of 
this type. 

For large n, relation (1) with the equality sign is approximately satisfied if 
W. = in + 1K.vVn, (ie., the largest integer contained in }n + 3KavV/n). 
Here K, is the standardized normal deviate exceeded with probability a. This 
value for W, was obtained from the normal approximation to the binomial 
theorem and furnishes a reasonably accurate solution of (1) with the equality 
sign for n > 10, (see [1]). 

As an example of a test of type 1, let r = 5,s = 2,7: = 1, jo = 2,% = 4, 
2, = 5. Then a = .0547 and the test is (approximately) 

Test 2. Accept the specified aliernative of Test 1 if 


min [x(n — 3) + 2x(1), x(n — 4) + x(2)] > 2a(3n + 3K cern). 


That this is a test of whether the 5 largest observations are too large is intuitively 
evident from the fact that a significant result will be obtained only if both 


a(n — 3) > 2a(4n + 4K osar/n) — x(1), 
a(n — 4) > 2x(4n + 3K osirr/n) — x(2). 


If the smallest two of the five largest observations are too large, it seems reason- 
able to suppose that all of the five are too large. A similar interpretation exists 
for all tests of the type of Test 1. 

The type (a) tests of whether the largest observations are too small are of 
the form 

Test 3. Accept that the r largest observations are too small to be consistent with 
the hypothesis that the populations have a common median value if 


max [r(n+1—-j7.) + 2(%);1< ks <r] < 2e(n+ 1 - W.,), 


where je = 1, Jv < joi, tu < hus 6 <n+t1l—We<n+1 —r7, and botha 
and W, are defined in Test 1. 

From the results for Test 1 and symmetry considerations, the significance 
level of test 3 tends to a as n — © if conditions (A) are satisfied; it does not 
exceed 2a for any admissible value of n. For Test 3, values of a@ suitable for 
significance levels can be obtained for all values of r (n sufficiently large). 

As indicated by (2), the tests of whether the largest observations are too large 





SOME NONPARAMETRIC TESTS 587 


can also be interpreted as tests of whether the smallest observations are too 
large. Similarly the tests of whether the largest observations are too small can 
also be interpreted as tests of whether the smallest observations are too small. 

The above discussion presents intuitive reasons for believing that Tests 1 and 3 
are suitable for the situations to which they are applied. To obtain a semi- 
quantitative measure of the suitability of these tests, this paper investigates 
the special case in which the r largest observations are from continuous sym- 
metrical populations with common median 6, the remaining observations are 
from continuous symmetrical populations with common median ¢, and each 
population has the property that the distribution of x — y is independent of y, 
where « is an observation from the population and y is the median of the popula- 
tion. The power function of a test of type 1 or 3 is defined to be the probability 
that the test is significant given the value of @ — ¢. It is found that the power 
functions of these tests have several desirable properties: For Test 1, the power 
function tends to zero as @ — ¢ — — ©, is a monotonically increasing function 
of 6 — ¢ for 6 — ¢ < 0, and tends to unity as @ — ¢ — o. For Test 3, the 
power function tends to zero as @ — ¢ — ©, is monotonically decreasing for 
6 — @ < 0, and tends to unity as 6 —-@—- —o~. 

For testing whether the populations are symmetrical in the tails given that 
they are continuous and have a common median, i.e., situation (b), a combination 
of 1 and 3 is used. The resulting test is 

Test 4. Accept that the populations are not symmetrical in the tails if either 


min [x(n + 1 — %&) + 2(j%x)3;1 Sk < 8] > 22(Wa) 


max [x(n + 1 — jx) + t(tx)3;1 Sk < 8] < 2e(n+ 1 —- W,), 


where a < 4, tu < tusi, Jo < Jott, Jw < tw, Js << Wa <n+1 —i,, andbotha 
and W, are defined in Test 1. 

Since both inequalities in Test 4 can not be satisfied simultaneously, the 
significance level of Test 4 tends to 2a as n — © if conditions (A) are satisfied; 
it never exceeds 4a for any admissible value of n. 

The asymptotic distribution (n — ©) of x(W.) is usually not very sensitive 
to symmetry of the populations. For example, if the n observations are a sample 
from a population with a probability density function f(x) such that (f() ¥ 0, 
(@ = population 50% point), and f’(x) exists and is continuous in a neighborhood 
of x = ¢, it can be shown that the only property of f(x) which influences the 
asymptotic distribution of 7(W.) is the value of f(¢). Thus, since a type 1 test 
investigates both whether the largest observations are too large and whether 
the smallest observations are too large (to be consistent with the assumption of 
symmetry), while a type 3 test investigates both whether the largest observations 
are too small and whether the smallest observations are too small, Test 4 should 
be suitable for testing whether a population has symmetrical tails. 



















588 JOHN E. WALSH 






3. Theorems and derivations. The fundamental fact used in this paper is 
that, if the observations are from continuous symmetrical populations with 
common median ¢, the value of 


Pr{min [x(n + 1 — &) + x(x); 1 < k < 5] > 24} 
Pr{max [x(n + 1 — jx) + e(%e)3; 1 < k < 8] < 2} 


is independent of n for the values of n permitted in the tests. This result is a 
special case of the following theorem 

THEOREM 1. Consider a set of n independent observations from continuous 
symmetrical populations with common median ¢. Lett, < -++ <tsandjy << +++ <Js 
be fixed sets of integers whose values are independent of n. Then the value of 


Pr{@th largest of [a(n + 1 — jx) + x(k); 1 << k < 8] < 2p} 


is the same for all values of n which are >i, + js — 1. In particular 


Qa 


m(2) m(3) m(2)—he 
a= ft + m/(1) +h [m(1) — Ay] + py a [m(1) — hi — he] + «= 


(3) m(u) m(u—1)—hy} m(2)—ho—+**—hy_} 
4+ 2 7 wee pm [m(1) —hy — -e* — hua, 
hy_1=1 hy—2=1 hy=1 


where 








w=i1.+ 7. —1, u=j,—1, mMijetve— 1) = t+ je — te — Je — 1 + I, 
~=0,1,---,s-—1, lsvuS jeri — Je; yo =j—-1=0. 
ProoF. It is sufficient to prove the theorem for the expression 


Pr{max [x(n + 1 — je) + e(%);1 Sk < 8] < 29}, 


since any probability expression of the form Pr{@th largest of [] < 29} 

can be expressed as a specified constant plus a sum of probabilities of the form 

Pr{max [ ] < 2} multiplied by specified constants, where in each case the 

terms in the [ ] area subset of thes terms: x(n + 1 — jx) + r(%), 1S k <8). 
Let the integer n have the value m . Then it can be verified that 


Pr{max [x(mo + 1 — je) + e(%e)3; 1 Sk < 8] < 26} 
(4) = Pr{max {22(m — js), te[mo +1 — W)+2[m+1-—- W-— m(W)); 


1<W < je} < 29], 
where 








Mj: tve—-—1) =m+2—-—%:—fe—%, m(js) = % — ts — Je =| 1, 
t=0,1,---,s-1l, lsuSjeir— Je; ob =j — 1 =0, 


by the use of Theorem 4 of [2]. By the proof of Theorem 5 of [2], the value 
of the second term in (4) equals 











SOME NONPARAMETRIC TESTS 


Pr{max {2x(m — js), 2[mo + 2 — W] + alm + 1 — W — m(W)); 
1<wW<jt+1} < 2) 


if m(js + 1) = 1 and the expression is based on m + 1 rather than 7 observations 
(the values of the m’s are the same as in (4)). The value of this expression, 
however, can be shown to equal the value of 


Pr{max {2x(m + 1 — js), a[no + 2 — W) + alm + 2 — W — m(W)); 
1<W <j} < 24], 
which by (4) equals the value of 
Pr{max [x(mo + 2 — jx) + t(%e);1 Sk < 8] < 29} 
ifn = m + 1 for this expression. Thus, by induction, the value of 
Pr{max [a(n + 1 — jx) + r(x); 1 Sk < 8] < 2} 


is the same for all sample sizes n > 2, + j,. An analysis similar to that used 
in the proof of Theorem 5 of [2] shows that this also holds for n = 7, + j, — 1. 
Equation (3) was obtained by taking n = w = i, + j, — 1, the m’s as given by 
(4) with this value of n, and substituting into Theorem 4 of [2]. 

Another basic result is that, if the observations are from continuous symmetri- 
cal populations with common median ¢, the value of , 


Pr{min [x(n + 1 — %) + r(x); 1 Sk < 8] > 22(W2)} 
= Pr{max [x(n + 1 — je) + ee); 1 < k < 8] < 2e(n + 1 — W,)} 


is always less than or equal to 2a. This is a particular application of the theorem 
THEOREM 2. Consider n independent observations from continuous symmetrical 
populations with common median >. Then, for any integer W, 


Pr{max [x(n + 1 — jx) + c(t); 1 Sk < 8] < 22(W)} 
< Pr{max [x(n + 1 s jx) + x(te)] < 26} + Pr{x(W) > 9} 
— Pr{max [x(n + 1 — jx) + 2(t,)] < 26, c(W) > 9}. 
PROOF. 
Pr{max [ ] < 22(W)} = Pr{max [ ] < 296, x(W) > ¢} 
+ Pr{max [ ] < 26, x(W) < ¢, max [ ] < 22(W)} 
+ Pr{max [ ] > 26, 7(W) > ¢, max [ ] < 22(W)} 
< Pr{max [ ] < 2¢,2(W) > o} + Pr{max [ ] < 29, c(W) < 4} 
+ Pr{max [ ] > 26, 2(W) > 9} 


= Pr{max [ ] < 26} + Pr{x(W) > ¢} — Pr{max[ ] < 26, z(W) > 9}. 








590 JOHN E. WALSH 





If the n independent observations satisfy conditions (A) in addition to being 
from continuous symmetrical populations with a common median value, the 
significance level of Tests 1 and 3 tends toa as n — ~. This follows from sym- 
metry considerations and 

THEOREM 3. Consider n independent observations which satisfy conditions (A) 


and are from continuous symmetrical populations with a common median value. 
Then 


lim Pr{min [x(n + 1 — %&) + a(j);1 < k < s] > 22(W.)} = a. 
Proor. Let 
Y=min(an+1—-%“%)+27y%);1<S k<s!] 
and consider the case where 


lim o[x(W.)|/o(Y) = 0. 





Since the populations are continuous, o(Y) > 0 and 
Pr[Y > 2x(W.)] = PrlY — 26 > 22(W.) — 29] 
Pr{[Y.— 29]/o(Y) > 2[z(Wa) — $]/o(Y)}. 


Let 
Z = 2[x(W.) — $)/o(Y). 


Then, from (i) of conditions (A), 
Prl¥ > 2x(We)] = [ Pril¥ — 261/o(¥) > a} dlr.(a) + a(n), 


where F, is the cdf of Z and lim B(n) = 0. 


Let b be any positive number. From lim o(Z) = 0, (ii) of conditions (A), and 


no 


the definition of z(W.), the mean of Z exists for all values of n and tends to 
zero as n — ©. Then, by Tchebycheff’s Inequality, it can be shown that 


[ dF,(a) = 1 — y(n), 


where lim y(n) = 0. 


From (iii) of conditions (A) 
lim Pr{[Y — 2¢]/o(Y) > —b} = lim Pr{[¥ — 26]/o(Y) > b} + 4(0), 
where lim 6(b) = 0. 
b—0 


Using the above relations, letting n — © first and then b — 0, it follows from 
Theorem 1 that 


lim Pr[Y > 22(W.)] = Pr{[Y — 26]/o(Y) > 0} = a. 


no 
























SOME NONPARAMETRIC TESTS 


A similar type proof shows that this limiting relation also holds when 
lim o[x(W.)|/o(Y) = ~. 
Tinally consider properties of the power functions of Tests 1 and 3 for the 


special situation outlined in sections 1 and 2. The properties stated in the pre- 
ceding two sections follow from 


THeoreM 4. Letx(n + 1 — r), --- , x(n) be from continuous symmetrical popula- 
tions with common median 6, the remaining order statistics from continuous symmet- 
rical populations with common median ¢, and each population have the property 
that the distribution of x — wy is independent of y, where x is an observation from 
the population and y is the median of the population. Also let 


P\(@) = Pr{min [ein +1—-%) +2); 1< kos <r 
> 22(Wa)|9—% = 9}, 
where the conditions for Test 1 are satisfied, and 
P,(®) = Pr{max [x(n + 1 — fx) + tH); 1 Sk Ss <7] 
< 2a(n + 1 — W.)| 0 — ¢ = 9}, 
where the conditions for Test 3 are satisfied. Then 


lim P,(#) = 0, lim P\(@) 1, 
b+ 


d——co 


}—+—co 


b0 


P,\(@) is a monotonically increasing function of ® for & < 0, and P;(#) is a mono- 
tonically decreasing function of ® for ® < 0. 

Proor. It is sufficient to prove this theorem for the power function of Test 3. 

The results for P,(@) can be obtained from symmetry considerations and obvious 
modifications of the proof for P3(). 
' First consider P;(@) for the case where ® < 0. Let a new set of observations 
be formed from the given set by subtracting the median value of the corre- 
sponding population from each observation. Let y(1), --- , y(n) be the values 
of the set of modified observations arranged in increasing order of magnitude. 
Since  < 0, 6 < ¢ and 


lstsn-r, 
n-r+1lstsn. 
Thus 
P;@) = Pr{max lyn +1— jp) ty@);loksscr) 


— 2y¥(n+1-—-— W,) < —9#}, 








592 JOHN E. WALSH 


whence it follows that P3(@) is a monotonically decreasing function of ® for 
® < 0 and that lim P;(@) = 1. 


d—+—00 
Now consider the case where ® > 0. Again form the set of modified observa- 
tions and let y(1), --- , y(n) be the values of these observations arranged in 
increasing order of magnitude. Then it is easily seen that 
P;(®) < Prly(1) — y(n) < — 39] 


so that lim P;(@) = 0. 


bo 
REFERENCES 
{1] Paut G. Hort, Introduction to Mathematical Statistics, John Wiley and Sons, 1947, 
p. 45. 


[2] Joun E. Watsu, ‘‘Some significance tests for the median which are valid under very 
general conditions,’’ Annals of Math. Stat., Vol. 20 (1949), pp. 64-81. 


ON A MEASURE OF DEPENDENCE BETWEEN 
TWO RANDOM VARIABLES 


By Nits Biomavist 
University of Stockholm and Boston University 


1. Summary. The properties of a measure of dependence q’ between two 
random variables are studied. It is shown (Sections 3-5) that g’ under fairly 
general conditions has an asymptotically normal distribution and provides 
approximate confidence limits for the population analogue of q’. A test of inde- 
pendence based on q’ is non-parametric (Section 6), and its asymptotic efficiency 
in the normal case is about 41% (Section 7). The q’-distribution in the case of 
independence is tabulated for sample sizes up to 50. 


2. Introduction and definitions. In drawing conclusions from statistical data 
it frequently happens that it is unnecessary to utilize all the information given 
by the data. In such cases it seems desirable to use methods which are 

1) valid under rather weak assumptions regarding the distribution of the 
population and 

2) easy to deal with in practice. 

Naturally such methods should always be used, but their applicability is, in 
most cases, limited by their small efficiency. 

Concerning methods of measuring correlation and testing independence some 
so-called rank correlation coefficients have been defined [2, 3, 4, 6] which have 
the first property. In large samples these are, however, rather tiresome to calcu- 
late, and a simpler method might then be preferable. The coefficient studied 
here has in most cases both properties mentioned above and can be used when- 
ever its efficiency is not too small. 

Let (a1, y:) --+ (%n, yn) be a sample from a two-dimensional population with 
cdf F(x, y), and consider the two sample medians Z and 7. The cdf F(z, y) is 
assumed to have continuous marginal cdf’s F(x) and F2(y) in order that the 
probability of obtaining two equal x-values or two equal y-values in the sample 
will be zero. Let the x, y-plane be divided into four regions by the lines x = % 
and y = 4. It is then clear that some information about the correlation between 
x and y can be obtained from the number of sample points, say n; , belonging 
to the first or third quadrants compared with the number, say nz, belonging 
to the second or fourth quadrants. 

Before going further we shall explain what is meant here by ‘belong to’. If 
the sample size n is an even number the calculation of ; and nz is evident. If, 
however, ” is an odd number one or two sample points must fall on the lines 
x = Zand y = J#. In the first case this sample point shall not be counted. In 
the other case one point falls on each of the lines. Then one of the points shall 
be said to belong to the quadrant touched by both points, while the other shall 

593 








594 NILS BLOMQVIST 


not be counted. It is easy to verify that both m; and m, by this method will be 
even numbers. 

As a measure of correlation we define 

ni — Ne 2n1 

1 © a at ae | —l1<q7 <1). 
a) m+n M+ Nm ( s¢sv 

The definition of g’ is not new [5] but as far as is known, its statistical proper- 
ties have never been studied completely. 


3. The asymptotic distribution. It is known [1] that the median in a sample 
from a one-dimensional distribution under certain conditions is a consistent 
estimate of the population median and asymptotically normally distributed. 
Although it seems possible to weaken the requirements in our case, we shall not 
do so. We require that 

a) the population medians are uniquely defined (and assumed to equal zero), 

b) the marginal distributions of F(x, y) admit density functions f,(z) and 
foly). 

c) fi(x), fo(y) and their first derivatives are continuous in some neighbourhood 
of the origin and 

d) f:(0) and f2(0) are ~0. 

In order to avoid trivial complications we shall assume here that the sample 
sizen = 2k + 1. 
Now define for every arbitrarily chosen point (2, y) 


a(z,y) = P{fgé>2,9> y}, 
b(z,y) = Pié< 2,0 > y}, 
e(z,y) = P{—é< 2,0< y}, 
d(a,y) = Plfé>2,n< y}, 
where the measure P refers to the cdf F(x, y) and evidently 
a+b+ct+d=1. 


As the number of sample points belonging to the first and third quadrants 
around (%, 7) must be equal, the probability of the combined event 


{m1 = 2r; Te(z, z + dx), Gey, Y + dy)} 





is ; 
. (2k + 1)! r k—r 
(3) px(2r; x, y) = rP-(k — nr) -(ac)"- (bd) -8, 
where 
i tw dende~ 2 este 
a b 


k-r 


+o + dee» dye — - 7 -d,d-d,d + dF. 





MEASURE OF DEPENDENCE 595 


Each of the first four terms of the expression (4) refers to a case in which two 
sample points determine (%, 7), and the last term refers to a case in which (2, 7) 
is determined by only one point. From (8) it follows that the probability of 
obtaining 7; at most equal to 2R is 


oo co R 
(5) Pim <2R} = [ [ D pl2r;2,y). 
If we introduce the joint cdf (x, y) of and %, (5) can be written 


R 
~ D pe(2r; x, y) 
(6) Pims2R}= | [ duly = 








k ’ 


x px(2r; x, y) 


as 


k 
d¥,.(x, y) = x p(2r; x, y). 


Clearly the integrand in (6) is <1 everywhere it exists. In the points (2, y) 
where the denominator is equal to zero the integrand is undefined, but as the 


measure (YW) of the set of such points is zero, we need not have any trouble 
with them. 


Under the conditions a)-d) % and § converge in probability to zero; that is 
1 for {x > 0,y > 0}, 
lim ¥;,(2, y) = 
ke 0 otherwise. 


Thus, when k and RF tend to infinity such that : — const, (6) becomes 


R 

Dd px(2r; 0, 0) 
(7) lim P{n, < 2R} = lim *———__ 
Xu px(2r; 0, 0) 


According to (3) 


(8) pu(2r5 0,0) = SE (cue) (bua) “Ss, 


where the subscripts indicate the value at the point (0, 0). Because of (2), 
Co = &, do = bo and ay + bo = 3, 
and the two parts of (8) are for large k 


(2k + 1)! 


1 
27rdo boV/ 2xk - 


or 12(k—r) —((r—2kag) 2/4kagho) 
<a ~ . -_ 





596 NILS BLOMQVIST 


and 


~9(2,0,-OO.+OG).- ORs 


The first of these expressions follows from the usual application of Stirling’s 
approximation formula and we omit all details here. 
Hence, after the introduction of 


r = 2kay + tr/2kavbe ; 
R 2kao + Tr/ 2kaobs ’ 


the expression (7) is transformed to 


: m, — 4kao 1 [ 412 
cm of = — 
(9) lim Pt ian < r} Je 1. € dt. 


From (9) it follows that m is asymptotically normally distributed with mean 
4kay and standard deviation ~/ 8kaob) . Thus 


fae ~ta Dw 


2k k 


is asymptotically normally distributed with mean 4a — 1 and standard deviation 
20/a0(1 — 2ao)/k. 


4. Properties as an estimator. Suppose we measure the correlation between 
x and y by 


(10) a= [fart [fae] —1 = 40-1, 


where, as before, (0, 0) are the coordinates of the population medians. Then g 
has the desired property of being equal to zero in the case of independence and 
equal to +1 in the case of linear relationship between x and y. 

According to (9) q’ is a consistent estimate of g when the conditions a)—d) are 
fulfilled. Furthermore, as the standard deviation of q’ is, to a first approximation, ' 
independent of quantities other than q, it is possible to construct approximate’ 
confidence limits for g for large sample sizes. This is done in the following way. 
In terms of n and gq we have, according to the last paragraph of section 3 and 
(10), 


Eq’ ~4, 


(q) ~ 4/—. 


Let ®(z) be a standardized normal cdf and ); and , two numbers such that 





MEASURE OF DEPENDENCE 
@(\2) — O(A1) = 1 — a. According to (9) we then have 
. =. — 
P<> —_———— . r ~ 1] — 
(11) { 1 i /1 st @ /n < 7 Qa, 


which gives the desired result. 
If we let Xx» = —A, = A and solve the inequality in (11) for q, the following 
symmetrical confidence interval is obtained 


i. seaceamiimmenenens bh. ceeds 
g- VN +n -— 4") <a<d t+ 7 Ve + all — 9), 
where we have used that \” <n. 


5. The normal case. If x and y are normally distributed with correlation 
coefficient p, we have 


(12) ; q = : arcsin p. 


This expression is the same as the mean of Esscher-Kendall’s rank correlation 
coefficient 7 [2, 4]. Hence, in the normal case q’ and 7 estimate the same quantity. 
The coefficient g’ has, however, a much smaller efficiency. The asymptotic 
efficiency of gq’ relative to the afore mentioned coefficient is 


4 [1 So 
o°(r) - —- 5 = 2 arcsin £) | _ 
ai . E _ € arcsin r) | 


. 
9 


for p = 0. 

6. Tests of independence based on q’. In testing independence between zx 
and y it is in practice more convenient to use critical regions based on 7, instead 
of q’. Since, under the null hypothesis, the measure of a critical region is inde- 
pendent of F(z, y) (Fi(x) and F.(y) are assumed to be continuous), any test 
based on 7 is non-parametric. We have made exact calculations of the q’-distribu- 
tion for sample sizes n up to 50. For larger sample sizes the normal approximation 
for nm, does not seem to entail errors of practical importance. 

To derive the exact distribution of m under the null hypothesis we suppose 
that n equals 2k. The probability that any k sample points shall have smaller 
z-values than the other k points is 


cy" 


Hence, since any arrangement of the sample points according to their z-values 
does not affect the distribution of the y-values, 


sl 
(:) 


(13) P{m = 2r} = 





598 NILS BLOMQVIST 


If n = 2k + 1 it is easily verified that the probability (13) remains unchanged, 
if we use the procedure in calculating n; and nz proposed in Section 2. This is, 
in fact, the main reason for the proposal. 


Table of P}| n- k | = v} 


8 12 16 20 24 28 32 3 40 44 48 





1.000 1.000 1.000 1.000 ‘ .000 .000 .000 -000 1.000 1.000 
.333 -486 .567 .619 ‘ .684 -706 724 .740 452 .764 
.029 .O80 132 ; .220 .257 .289 .318 -343 .366 

.0022  .010 é .039 .057 .076 .094 113 -131 

0002... -0033 .0070 .012 O18 .026 .034 

-0001 .0004 .0011 .0022 .0038 .0060 

.0002 .0004 .0007 

-0001 








18 22 26 30 34 38 42 46 50 


1.000 1.000 1.000 1.000 1.000 .000 .000 -000 1.000 1.000 -000 1.000 
- 100 -206 -286 .347 .395 .434 .466 -494 517 .538 -556 .572 
.0079 .029 .057 .086 115 . 143 . 169 .194 217 .238 -258 

-0006 .0034 . 0089 O17 .027 .038 .050 .063 -076 -089 

. 0003 .0012 .0028 .0053 . 0086 -013 017 .023 

0001 .0004 .0009 .0017 .0028 .0042 

.0001 -0001 .0003 .0005 


2k is the largest even number contained in the sample size. 
The distribution of mn; is symmetric about n, = k with the variance 


2 
k 
2k — 1° 
Thus, in testing independence we can for large sample sizes use 
ny — k 


“ae -VS2k — 1 


as an approximately normally distributed random variable with mean 
and unit s.d. 


7. The asymptotic efficiency of the q’-test. In the case that x and y are nor- 
mally distributed with the correlation coefficient p, it is possible, but rather 
tedious, to calculate the power function of the g’-test. We will, therefore, restrict 
ourselves to considering only the asymptotic behavior of the power function. 

Consider tests of independence (p = 0) against one-sided alternatives p > 0. 
Let L{ (p) be the power function of the q’-test for the sample size m and L‘(p) 
be the power function of the test based on the correlation coefficient r in a 
sample of size n. We assume that all tests have the same size, i.e. 


(14) Li? (0) = L@(0) = a 





MEASURE OF DEPENDENCE 599 


for all m and n. We shall say that the q’-test has the asymptotic efficiency e if 


(15) 


when 


This means that the sample size in using the r-test need only be 100% of 
that in using the q’-test, in order to get the same derivative of the power functions 
at p = 0 (for large sample sizes). Since the definition of « only concerns the 
behavior in the neighborhood of p = 0, it might perhaps be more correct to call e 
the asymptotic local efficiency. 

In order to calculate « we define two sequences {gm} and {ra} such that 
{q’ > qm} and {r > r,} are tests with the afore mentioned properties. According 
to (9) and (10) q’ is asymptotically normally distributed with mean g and s.d. 


V/ (1 — @)/m. Furthermore, 7 is asymptotically normally distributed with mean 
pand s.d. (1 — p’)/+/n. Hence, 


1 — Li? (o) = P{q' < am | p} )~o| Hot svi |, 


1-— LY (p) = Pir<ra|\ p} ~o| eas val), 


from which it follows 


(=) ~ &'(gm-a/m) - 4) Vm, 


(2) ” : 
(2 ) ~ B'(ra-/n) Vn. 
Op Jo 
According to (14) we have 
lim gm-/m = lim ra: =e (1 — a). 


m—o n—cO 


(16) 


Thus we conclude 


(17) lim eh ns ™ (3 ‘) =. 
..” 





600 NILS BLOMQVIST 


( 


In other words, the asymptotic efficiency of the q’-test is about 41%. 


Hence, according to (12) and (15) 


8. Concluding remarks. An interesting similarity exists between the q’-test 
of independence and a test of equal location parameters in two distributions, 
constructed in the following way. Suppose that two samples of equal size, say k, 
are drawn independently from two distributions. Compute the number of 
individuals, say r, in the first sample, falling short of the median of the pooled 
samples. Then the distribution of 2r under the null hypothesis is the same as 
that of m; in the q’-test for sample size 2k (or 2k + 1). The test based on r was 
discussed by F. Mosteller [7]. 

Another similarity is between the q’-test and a special case of the exact test of 
independence in a 2 x 2 table [8]. If in such a table the marginals happen to be cut 
at the 50% points the two test procedures become identical. 


REFERENCES 


[1] H. Cram&tr, Mathematical Methods of Statistics, Princeton University Press, 1946. 
[2] F. Esscurr, ‘‘On a method of determining correlation from the ranks of a variate’’, 
Skandinavisk Aktuarietidskrift, Vol. 7 (1924), p. 201. 
[3] W. Horrrpine, “A non-parametric test of independence’, Annals of Math. Stat., 
Vol. 19 (1948), p. 546. 
] M. G. Kenpa.tu, “‘A new mees2re of rank correlation’’, Biometrika, Vol. 30 (1938), p. 81. 
] F. MostTe.ier, ‘‘On some useful ‘inetlicient’ statisties’’, Annals of Math. Stat., Vol. 17 
(1946), p. 377. 
6] C. SpeaRMAN, “‘The proof and measurement of association between two things’’, Am. 
Jour. of Psych., Vol. 15 (1904), p. 88. 
[7] F. Mostreier, ‘“‘On some useful ‘inefficient’ statistics’’, unpublished thesis, Princeton 
University, 1946. 
[8] R. A. Fisuer, Statistical Methods for Research Workers, 8th Ed, Stechert & Co., 1941. 


[4 
[5 











SOME TWO SAMPLE TESTS 


By Douctas G. CHAPMAN! 


University of Washington 














1. Introduction and summary. Stein [4] has exhibited a double sampling pro- 
cedure to test hypotheses concerning the mean of normal variables with power 
independent of the unknown variances. This procedure is here adapted to test 
hypotheses concerning the ratio of means of two normal populations, also with 
power independent of the unknown variances. The use of a two sample procedure 
in a regression problem is also considered. 

Let {X;;} ( = 1, 2) G = 1, 2, 3, ---) be independent random variables 
distributed according to N(m; , o;): all parameters are assumed to be unknown. 

Defining k by the equation 


(1) m = kmz2 


we wish to test the hypothesis H that k has a specified value ky . 

If ko = 1 the hypothesis H reduces to a classical problem, often referred to 
in the literature as the Behrens-Fisher-problem (cf. Scheffé [3] for a bibliography). 
At the present time it is still an open question whether it is possible (or desirable) 
to find a non-trivial single sample test for H with the size of the critical region 
independent of o; and o2. In any case it is a simple extension of the result of 
Dantzig [1] (cf. also Stein [4]) to show that no non-trivial single sample test 
exists whose power is independent of o; and a2. 

On the other hand the case ky) # 1 may be expected to occur frequently in 
fields of application where a choice must be made between different products, 
methods of experimentation etc. which involve different costs. The statistician 
must make a chcice on the basis of results relative to the ratio of costs involved. 
Nevertheless this problem appears to have received little attention in the 
literature. 

In general tests based on a two-sample procedure may not be as “efficient”’ 
in the sense of Wald [5] as a strict sequential procedure. On the other hand the 
two sample procedure reduces the number of decisions to be made by the experi- 
menter and it will, in certain fields, simplify the experimental procedure. 


























2. The two sample procedure. Stein’s double sampling procedure (which may 
be denoted procedure S) to test a hypothesis concerning the mean of a normal 
population consists briefly in the following steps: 

(a) Choose “a priori’ a positive number z and a preliminary sample size n. 
(b) Take n independent observations x, , --- , 2, of the random variable X 












1 This research was carried out while the author was at the University of California. 
Berkeley, and was supported in part by the Office of Naval Research. 


601 


602 DOUGLAS G. CHAPMAN 


which is assumed to be distributed according to N(m, o*) with unknown mean m 
. 9 
and unknown variance o’, and calculate 


2 (es — 4). 


2 2 = 
(2) , n—-1 


(c) Let N = max(| “ | +1,n+ 1) where [r] = largest integer < r 


(d) Take N — n more independent observations of X and choose a set of 
constants a, , --- ay such that 


N - : 
(3) (i) 2 a, aed 1, (ii) Q@ = GQ = -:* = ay . (iii) pe a; oa ue ° 
i=] ian 
N 
~ a;%i — m 


(e) Then ="___——_ has Student’s -distribution with n — 1 degrees of 


V2 
freedom. 

Stein further showed that the procedure may be modified to some advantage 
in problems dealing with a single population. This modification is not applicable 
in the problems under consideration here. 

There remains to be discussed briefly the choice of n, z and the a’s. The pre- 
liminary sample size n may be determined by other considerations or it may be 
chosen as part of the design of the experiment. Hodges [2] has shown that the 
expected value of the total sample size N and the power of the test both depend 
on the choice of n and he has discussed the optimum choice of n with respect 
to the modified procedure of Stein. In general this optimum choice of n depends 
upon prior knowledge concerning the variance. 

The power of the test will depend upon z: some considerations concerning 
the choice of z will be dealt with after discussing the tables upon which the 
two sample tests are based. 

The arbitrariness involved in choosing the a’s may be eliminated by placing 
the additional requirement that 


(4) Ont = Ong2 = °° = Gyn =bD (say). 


Letting a; = a. = --- = a, = ait is elementary to solve fora and b explicitly 
viz., 


(5) na + (N — n)b 


na + (N — n)bv° 
The solutions are 
n(Nz — wu?) ) 
(6) =3(2 i 4/ ew) —n)w J’ 
(7) a= 1— (VW — nb 


n 








TWO SAMPLE TESTS 603 






















3. Test for H. The steps involved in testing the hypothesis H are 
(a) Choose the preliminary sample size n, and positive numbers 2; , 22 subject 
to the restriction 


(8) 





(b) Carry out procedure S with the same n for each population, determining 
two statistics 7, , T2, i.e. 


Ni 

2X Aijrij 
9) ,,0 =. 
( . V 2: 


Then 7; — T2has, under the hypothesis tested, the distribution of the difference 
of two independent Student variables. 

If s denotes the difference of two independent random variables f, and fz 
each distributed according to Student’s ¢-distribution with n — 1 degrees of 
freedom and if so is defined by the equation 


@ = 1, 2). 


P(|s| > 8) = a, 


then a test of size a is given by the rule: H is rejected if | T, -— T2| > %. 





4. The distribution of differences of Student variables. The distribution of s 
is easily found by the method of characteristic functions, in case n is even. 
Let m = n — 1 and to simplify slightly put 
ti 
i= = ’ 2 e 
(10) y a/m (@ 1, 2) 


Then the density function of y; is 


r (= + ) 
2 1 
(11) fW= vzr(3) (1+ ye 


2 










and its characteristic function 


+20 . 
(12) g(t) = [ e™' f(y) dy 





nad (* —1 4 r) 
/ ate m—1)/2 : 
(13) ie ws. : > 2 [2(| eo 


_ m\ 2" | sa—- 3 ; 
5 m! 5 —r})! 


Formula (13) may be obtained by contour integration; it is, however, a standard 
formula in connection with Bessel functions of the second kind of purely imagi- 
nary argument (cf. Watson [6], pp. 80, 185-188). 








604 DOUGLAS G. CHAPMAN 
While it is not possible to obtain a simple general expression for 


+00 
(14) fw) = = [ele (oF at, 


the density function of w = Tz this integral may be evaluated for m = 1, 3,5 


etc. and furthermore the density function of s may be integrated in a closed form 
for such values of m, and consequently tabulated fairly easily. 

In case 7 is odd it is possible to express ¢,(#) in terms of Bessel functions but 
the Bessel functions obtained are not expressible in a closed form. While the 
problem may be attacked directly by numerical integration, it will generally be 
sufficient to interpolate in Table I where necessary, for such values of n. 

Table I gives the distribution of s for n = 2, 4, 6, 8, 10, 12. For larger values 
of n it may be sufficiently accurate to use the normal approximation to the 
distribution of s. In virtue of the asymptotic normality of the t-distribution s 
will be distributed approximately normally with mean zero and variance te Seow 


for n sufficiently large. 


5. Power of the test. Writing 


Mm, Me 


(15) + oe ae and T=T7T,-T, 
it is seen that JT = s + A and hence 
(16) P(H is rejected) = P(| T | > %) = P(s < —s — A) + P(s > % — A). 


sR (L-9 


equation (16) may be used as a guide in choosing z; so that a certain minimum 
power is attained; the presence of the nuisance parameter m, makes impossible 
the determination of z. so as to give exactly some preassigned power. 

Since s is distributed independently of o; , a2 , it follows that the power of the 
test is independent of these parameters. Using the addition formula to express 
the frequency function of s in terms of the frequency function of Students’ 
t-distribution, it may be shown that f(s) in unimodal and symmetrical about 

= (0. Hence the test is unbiased. It also follows from (16) that if z. is made to 
approach zero the probability of rejecting H when it is false tends to 1: i.e. 
the test is consistent. 

It may be observed that tests for the one-sided hypotheses 


Since 


=k « =< 
Me m2 


~~ w& \“Y 


-— Vw or 


a 


TWO SAMPLE TESTS 605 
may easily be formulated. Table II provides a table useful for such tests also, 
at half the indicated significance levels. 


TABLE I 
Distribution of s: difference of two independent student-variables with n — 1 degrees of freedom 
The value tabled is P(OSsSs3,) 




















he | | | | | | Normal Ap- 
s, ” 2 | 4 6 8 10 | 12 proximation 
al | for » = 12 
0.50 0.0780 | 0.1014 | 0.1222 | 0.1265 | 0.1290 | 0.1306 | 0.1254 
1.00 .1476 .1922 .2311 .2392 .2438 .2467 .2388 
1.50 .2048 .2660 .3185 .3290 .3349 .3386 .3313 
2.00 .2500 .3243 3825 .3939 .4002 4041 .3996 
2.50 2852 |  .3620 4260 .4364 4415 .4465 4451 
3.00 .3128 .3903 4542 4637 .4687 A724 4725 
3.50 .3348 .4104 4726 .4796 4834 .4856 4874 
4.00 .3524 4247 A825 4884 .4914 .4929 4947 
4.50 .3669 .4352 .4890 .4936 .4956 .4966 .4980 
5.00 .3789 4431 .4930 .4964 4977 
5.50 .3890 4491 | .4955 .4980 .4988 
6.00 3976 | .4539 .4970 .4988 
6.50 .4050 .4578 .4980 
7.00 4114 | .4611 | .4986 
7.50 4170 | .4638 | 
8.00 .4220 .4661 
10.00 4372 .4730 
12.00 4474 | A774 
21.00 .4698 .4870 
30.00 4788 |  .4908 | | 
50.00 4873 | | | 
100.00 4936 | | | 
TABLE II 
The 5% and 1% significance points of the distribution of s 
The value tabled is s. 
* " | | | | | Normal 
Ns | g@ to tf el we | we] oe 
Significance Level \ | | n= 12 
scateseadlbacmtaeieammaiccecaici tan clan a i a 
P(ls| 2s) = 05 | 25.41 | 10.82 | 3.62 | 3.34 | 3.18 | 3.10 | 3.06 
P(js|zs.) = .01 | 127.3 | 368 | 5.38 | 4.72 | 4.42 | 4.26 | 4.03 





6. A regression problem. We consider the problem where x; are values of a 
sure variable, Y; are independent random variables with 


(17) E(Y,) =a + be; 


and oy, is unknown. It is desired to estimate a and b and to test the hypothesis 
b = bo ° 








606 DOUGLAS G. CHAPMAN 

The usual procedure is to assume oy, constant, and use the Markov theorem 
(i.e. the standard least squares formulae). In this way unbiased estimates of 
a and b are obtained, whether or not this assumption is fulfilled. However the 
usual significance test for b is not valid if this assumption (plus normality of 
the Y’s) is not fulfilled. 

The two sample procedure leads to a valid test of the hypothesis b = bo , with 
power independent of the unknown variance. Since linearity of the expected 
value of Y on z is assumed, the optimum procedure is to observe Y for only two 
values of x, at opposite ends of the range. Let these points be 2; , x2 . For these 
values of x, procedure S may be used (choosing 21= 22) to determine 7; , T; 
where 7; — (a + bz;)/+/z has Student’s t-distribution with n — 1 degrees of 
freedom. 

Then the following estimates of a, b are unbiased, for n > 3, 


(18) i, = (Z— 2) wt 
=m Za 
es mts — Th) . 
~ — ( tz — v2. 


To test the hypothesis H,:b = bo it is necessary only to calculate the statistic 
t= (Ti — Ts) Vz — bola — 22)]/+/z and reject H,, at the a level of sig- 
nificance if | ¢ | > so , where s) was defined above (Section 3). 

It is seen that if b’ is the true value of b, then the power of the test is a function 
of (b’ — bo) (x1 — 22)/+/z and z may be determined to obtain any prescribed power 
desired. It is also immediate that the power of the test is independent of oy, . 

The author wishes to express thanks to the members of the computing staff 
of the Statistical Laboratory, University of California, Mrs. E. Putz, Miss J. 
Linton, and Mr. J. Blum, for assistance in preparing Tables I and II.? 


REFERENCES 


[1] GeorcE B. Danrzie, ‘‘On the non-existence of tests of ‘Student’s’ hypothesis having 
power functions independent of ¢,’’ Annals of Math. Stat., Vol. 11 (1940), p. 186. 

[2] JoserH L. Honpaes, Jr., ‘‘The selection of initial sample size in the Stein two sample 
procedure”, unpublished dissertation, University of California, Berkeley, 1948. 

[3] Henry ScuerF®, ‘‘On solutions of the Behrens-Fisher Problem based on the ¢-distribu- 
tion’’, Annals of Math. Stat., Vol. 14 (1943), p. 35. 

[4] Cuar.zEs Stern, ‘‘A two sample test for a linear hypothesis whose power is independent 
of the variance’’, Annals of Math. Stat., Vol. 16 (1945), p. 243. 

[5] ABRAHAM WALD, Sequential Analysis, John Wiley and Sons, Inc., 1947. 

[6] G. N. Watson, A Treatise on the Theory of Bessel Functions, Cambridge University 
Press, 1944. 


2It has been pointed out to the writer that percent points of linear combinations of 
two independent Student t’s are given in Table VI (by P. V. Sukatme) in R. A. FisHErR 
AND F. Yates, Statistical Tables for Biological, Medical and Agricultural Research, Oliver 
and Boyd, Edinburgh, 1943 (added in page proof). 


NOTES 


This section is devoted to brief research and expository articles and other short items. 


TRANSFORMATIONS RELATED TO THE ANGULAR AND 
THE SQUARE ROOT 


By Murray F. FREEMAN AND JOHN W. TuKEy! 
Princeton University 


1. Summary. The use of transformations to stabilize the variance of binomial 
or Poisson data is familiar (Anscombe [1], Bartlett [2, 3], Curtiss [4], Eisenhart 
[5]). The comparison of transformed binomial or Poisson data with percentage 
points of the normal distribution to make approximate significance tests or 
to set approximate confidence intervals is less familiar. Mosteller and Tukey [6] 
have recently made a graphical application of a transformation related to the 
square-root transformation for such purposes, where the use of ‘binomial 
probability paper” avoids all computation. We report here on an empirical study 
of a number of approximations, some intended for significance and confidence 
work and others for variance stabilization. 

For significance testing and the setting of confidence limits, we should like 
to use the normal deviate K exceeded with the same probability as the number of 


successes x from n in a binomial distribution with expectation np, which is 
defined by 


1 


K 
e*” dt = Prob {x < k| binomial, n, p}. 


2r Ln 


The most useful approximations to K that we can propose here are N (very 
simple), N* (accurate near the usual percentage points), and N** (quite accurate 
generally), where 


N =2(VK+ lq -— Vin — Bp). 


(This is the approximation used with binomial probability paper.) 


N+ 2p-1 
7 — , ao = Ss 
N'=N+ 12/E° E = lesser of np and nq, 


. (N — 2)(N + 2) 1 1 ) 
* = querer nnn eS, = 
ae 12 (T5Fi Vng + 1)’ 


N* + 2p — 1 
ee me T* eS = 
N** = N* + Ts E = lesser of np and nq. 


For variance stabilization, the averaged angular transformation 


+ —1 x + —1 z+il1 
sin oe + sin Vint 


1 Prepared in connection with research sponsored by the Office of Naval Research. 
607 





608 MURRAY F. FREEMAN AND JOHN W. TUKEY 


has variance within +6% of 


821 


(angles in radians), ; (angles in degrees), 
2 


1 
n+ 4 
for almost all cases where np > 1. 
In the Poisson case, this simplifies to using 
VitvVrtl 


as having variance 1. 


2. Significance testing. In addition to the approximations mentioned above, 
empirical study was also made of the following 


x — np 
ee . 
V npg 
L* = L modified by a term like that in N*, 


ia i I 
M 2Vn +i (sin 7 - ve); 


M* = M modified by a term like that in N*. 


Taking an upper limit of 2.5 or 3.5 on | K | and a lower limit of 0.01, 1, or 4 
on np, the greatest observed errors of the approximations were smallest for 
N**, N* and M* and largest for the direct approximations LZ and L*. This 
was true for all six choices of region. 

If we exclude the cases k = 0 and k = n, where the desired probability can be 
calculated directly, the largest observed errors in the substantial number of 
cases computed, which are probably representative of the regions where the 
approximations are worst, were as follows: 


|K| | E=np Nn** 


Largest observed error of 
N* Nt N 


<2.5 >4 ot 16.17 
>1 0 19 «6.20.24 
>0.01 .04 j : .1¢ .20 


>4 08.07) 19 #25 2 63 
as 11.10 38 1.26 
| 3001. 21 «4.65 —ti«C*«*SG‘“ 3.42 


Within the range of great interest, | K | < 2.5, that is .0062 < probability 
< .9938, we have errors of less than 0.04 in N** and less than 0.20 in N. 
For 1.5 < | K | < 2.5, the range of greatest interest, the average error of 


N+ was less than 0.03 and the maximum was 0.08 (54 cases considered). 





TRANSFORMATIONS 


Thus, we can recommend 
N —as a simple and usually accurate transformation, 
N* —for rapid significance testing, 
N**—for adequate accuracy at all levels. 


Figure 1 shows the behavior of the various approximations in the case n = 50, 
np = 5. This is roughly typical. 
40 


ERRORS FOR- 
n=50 


np+5 


“1 K=O “| 
Fig. 1. Errors of approximation. 


3. Variance stabilization. The various suggestions for stabilizing the variance 
of the Poisson are: 


Vx + 1/2, (Bartlett [2]), 
Vx + 3/8, (Anscombe [1]), 
Viet Vz +1, (this paper). 


Figure 2 shows the variance of the transformed variate as a function of the 
Poisson expectation. Clearly ~/z + ~/zx + 1 is the best if small expectations 
are to be considered. The simplicity with which it can be read from a square-root 
table, and its unit variance, are also favorable factors. 

When an approximation of a given form is to work over as large a range as 





610 MURRAY F. FREEMAN AND JOHN W. TUKEY 


possible without the magnitude of its errors exceeding a certain limit, the opti- 
mum approximation is almost certain to involve errors of both signs. If +6% 
variation in variance is permissable, ~/x + ~/z + 1 is usable for expectations 
of unity or more. It is not surprising that Anscombe’s approximation, obtained 
by eliminating the term in n™, and dominated by the term in n~’, should only 
meet the +6% tolerance for expectations of 2.2 or more. 


VX+ 3/8 —> 


Ratio of actual to limiting variance. 





| 2 3 
Fic. 2. Stabilization of Poisson variance. 


4. Scope. Values of K, and with some occasional exceptions, of L, L*, 
M* , N, N*, N* and N** were calculated for 


n = 2,5, 10, 20, 100, 
p = 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, . 
Kk giving K < 4.5, 
and similar computat’ons were made for the Poisson case with expectations 


1/100, 1/40, 1/20, 1/10, 1/5, 1/2, 1, 2, 4, 8, 16, 32, 64. 





REMARK 611 


These computations were made to only two decimal places, so that the final 
results may easily err by 1, 2, or 3 in the second decimal place. 

A more complete discussion of the problem, the origin of the approximations, 
and tables showing a representative collection of actual values can be found in 
Memorandum Report 24 of the Statistical Research Group, Princeton Univer- 
sity, which bears the same title as this note. Copies may be obtained from its 
Secretary, Box 708, Princeton, N. J. 


REFERENCES 

fl] F. J. Anscomse, ‘‘The transformation of Poisson, binomial, and negative binomial 
data’’, Biometrika, Vol. &> (1948), pp. 246-254. 

[2] M.S. Barrett, ‘‘The square root transformation in the analysis of variance’, Jour. 
Roy. Stat. Soc., Suppl., Vol. 3 (1936), pp. 68-78. 

3] M. S. Bartiert, ‘‘The use of transformations’’, Biometrics, Vol. 3 (1947), pp. 39-51. 

[4] J. H. Curtiss, ‘‘On transformations used in the analysis of variance’’, Annals of Math. 
Stat., Vol. 14 (1943), pp. 107-122. 

[5] CHurcHILL ErsennartT, ‘“The assumptions underlying the analysis of variance’’, 
Biometrics, Vol. 3 (1947), pp. 1-21. 

[6] FREDERICK MosTELLER AND JOHN W. Tukey, ‘‘The uses and usefulness of binomial 
probability paper’, Jour. Am. Stat. Assn., Vol. 44 (1949), pp. 174-212. 


a I 


REMARK ON THE ARTICLE “ON A CLASS OF DISTRIBUTIONS THAT 
APPROACH THE NORMAL DISTRIBUTION FUNCTION” BY 
GEORGE B. DANTZIG' 


By T. N. E. GREVILLE 
Federal Security Agency 


In this interesting and valuable article, Dr. Dantzig showed that, under 
certain conditions, a sequence of frequency distributions connected by a linear 
recurrence formula converges to the normal distribution. Among several applica- 
tions of his results which are discussed, the author mentions their relation to 
certain types of smoothing formulas, and has shown that if a linear smoothing 
formula and the data to which it is applied satisfy certain conditions, the iteration 
of the smoothing process produces a sequence of smoothed distributions which, 
upon normalization, approaches the normal frequency curve. 

In a summary paragraph at the end of the article, it is stated that ‘‘successive 
application of one or many such linear formulas will usually smooth any set of 
values to the normal curve of error.” The entire article was concerned with 
frequency distributions, and a careful reading makes it clear that the author 
intended the quoted statement to apply only to data in this form. However, its 
rather general wording seems to have led a number of readers to interpret it as 
being applicable to other types of data, such as time series, which frequently 
may not satisfy the conditions assumed. Moreover, it is easy to overlook the 


1 Annals of Math. Stat., Vol. 10 (1939), pp. 247-253. 





612 T. N. E. GREVILLE 


restrictions imposed on both the original data and the smoothing formula as 
they are stated only by implication, and not explicitly, even though they have 
the effect of excluding important classes of smoothing formulas, such as those 
commonly employed by actuaries. 

The approach to the normal distribution is shown to depend on the vanishing 
of a certain limit denoted as I’ which is a function of the moments of the original 
data and of a distribution in which the weights employed in the smoothing 
formula are interpreted as frequencies. At this point, objection may be taken 
to Dr. Dantzig’s proof, since the smoothing formulas most frequently used 
contain negative weights. However, it has been shown elsewhere’ that the 
occurrence of negative weights will not of itself prevent the sequence of smoothed 
distributions from approaching the normal curve. A somewhat more serious 
difficulty arises if, as is commonly the case, the smoothing formula has the 
property of reproducing polynomials of a specified degree. If the degree repro- 
duced is two or more, this implies the vanishing of the second moment of the 
weight distribution, in which case the limit I’ does not vanish. In fact, it has 
been shown by DeForest’ and Schoenberg that the iteration of smoothing 
formulas which reproduce polynomials of higher degree gives rise to a sequence 
of limiting distributions which have the general appearance of the normal curve 
in the center portion and of a damped sine curve in the tails. This is, however, at 
best, a technical exception to Dantzig’s statement, as one is still faced with his 
basic proposition that repeated application of a smoothing formula to a frequency 
distribution will cause the smoothed distribution to be dominated by the char- 
acteristics of the smoothing formula rather than those of the original data. 

While he did not intend the statement to refer to data not in the form of a 
frequency distribution, some readers seem to have interpreted it as being of 
general application, and, for that reason, I should like to point out a few of the 
considerations involved in applying iterated smoothing to other types of data, 
such as, for example, a time series or the values of a mathematical function. 
The limit I’, on whose vanishing Dantzig’s theorem depends, involves the 
second and fourth moments of the original data (as well as of the weight dis- 
tribution) and, therefore, can be computed only if these moments exist. For 
this it is necessary (but, of course, not sufficient) that the function being smoothed 
shall tend toward zero as the independent variable approaches positive or 
negative infinity. 

In order to iterate a smoothing formula an infinite number of times, it is 
obviously necessary to have an infinite set of original values. Therefore, in 
smoothing, for example, a finite time series, one would have to make some 
assumption regarding the values of the series outside the range for which they 


27. J. ScHOENBERG, ‘“‘Some analytical aspects of the problem of smoothing,” Courant 
Anniversary Volume, Interscience Publishers, New York, 1948. 

3H. H. WoLrenveEn, ‘‘On the development of formulae for graduation by linear com- 
pounding, with special reference to the work of Erastus L. DeForest,’’ Trans. Actuarial 
Soc. Am., Vol. 26 (1925), pp. 81-121. 





REMARK 613 


are actually available. Of course, if it were assumed that the values were zero 
outside this range, Dantzig’s theorem would apply. However, under this assump- 
tion, infinite iteration of a smoothing formula would not be a rational procedure, 
as it would smooth each value to zero, and the incidental fact that the sequence 
of smoothed distributions, while approaching zero, also approach the form of a 
normal distribution, would not be a very valuable one. In this connection, an 
important distinction between time series and frequency data is that, in dealing 
with the former, one is interested in the magnitude of individual values as well 
as in the general form and shape of the distribution. In practice it might be 
preferable not to make any assumption about the values outside the given 
range but rather to employ special devices to obtain smoothed values near the 
ends of this range. In such a case, the smoothing process would be a function 
of the range (if not of the actual values) of the original data distribution. Such a 
process was not considered by Dantzig, and is clearly excluded by his definition 
of a linear smoothing formula, which requires that the formula be completely 
independent of the data to which it is applied. 

The somewhat academic question of the effect of iteration of a smoothing 
formula on a function of infinite range for which the moments do not exist, is a 
difficult one, to which I cannot give a general answer. Schoenberg does not 
consider this problem, but merely gives the weight distribution to be applied 
to the original data in order to obtain the limiting smoothed distribution. Two 
trivial examples may, however, serve to illustrate the nature of the considerations 
involved. If the original data are values of a polynomial of a specified degree, 
and if a smoothing formula which reproduces that degree is successively applied, 
it will of course continue indefinitely to reproduce the original values. On the 
other hand, if the smoothing formula reproduces only polynomials of lower 
degree, a bias is introduced. As a simple example, we may consider the case of 
smoothing the function y = 2° by a formula consisting of three weights each 
equal to 1 3 to be applied to the given value and its two immediate neighbors. 
It is easily shown that the smoothed value is z° + 1/3, and the effect of successive 
application of this formula is to add 1/3 each time. Thus each smoothed value 
would tend toward infinity as the number of smoothings increases; however, 
the entire distribution would always remain a parabola of the same form as 
originally. 

Finally, I should like to emphasize that, in common with Dr. Dantzig, I 
do not regard infinite repetition of the smoothing operation as a practical pro- 
cedure, but consider it preferable to select, in the first instance, a smoothing 
formula which is likely to have the desired effect and then to perform the smooth- 
ing in a single step. In this way, one is more likely to secure the result desired 
without losing sight of important characteristics of the original data. 





YUKIYOSI KAWADA 


INDEPENDENCE OF QUADRATIC FORMS IN NORMALLY 
CORRELATED VARIABLES! 


By Yuxktryost KAwADA 


Tokyo University of Literature and Science 


The problem to give a necessary and sufficient condition that two quadratic 
forms in normally correlated variables are independent was treated by many 
authors [1], [2], [3], [4], [5]. We shall give here also a solution of this problem, 
which may be a generalization of that given by B. Matérn [6] for nonnegative 
quadratic forms to the general case. 

THEOREM 1. If two quadratic forms 


(1) Q = do ajeizy, Qe = Do diaz; 
ij=1 tj=1 


an normally correlated variables x; , +++ , Xn with zero means and with the variance 
matrix I satisfy the following four conditions 


(2) F;; = E(QiQ:) — EQDE(Q:) = 0 (i,7 = 1, 2), 
then the relation 


(3) AB (A = (a;;), B = (b;;)) 
holds. 


Corouuary 1. If Q:,Q2 in (1) satisfy the four conditions (2), then Q: and Q, 
are independent. 


Coro.Luary 2. (Necessity portion of the theorem of Craig) A necessary 
condition for the independence of Q: and Q. is AB = 0. (The sufficiency was 
proved by Craig.) . 

Proor or THEOREM 1. The proof is very simple. Using the values E(zx;.) = 0, 
(i = 1,3, 5, 7), E(ai) = 1, E(ai) = 3, E(we) = 15, E(ai) = 105 (k = 1, --- , n), 
we have by a straightforward calculation’ the following relations 
(4) Fu = 2Tr(AB), 

(5) Fy = 8Tr(AB’) + 4Tr(AB)Tr(B), 

(6) Fa = 8Tr(A°B) + 4Tr(AB)Tr(A), 

(7) Fe = 32Tr(A*B’) + 16Tr((AB)*) + 16Tr(AB*)Tr(A) + 16Tr(A°B)Tr(B) 
+ 8Tr(AB)Tr(A)Tr(B) + 8Tr(AB)’. 


1 Presented at the Chapel Hill meeting of the Institute of Mathematical Statistics and 
Biometric Society March 18, 1950. 

2 If we apply an orthogonal transformation on (x; , --- , t,) so that A becomes a diagonal 
form, the calculation becomes simpler than with the general form. We may note here also 
the fact that we need not assume that z, , --- , 2, are normally correlated, but we use 
only the values of E(z,) (i = 1, --- , 8) for our proof. 





ERRATA 615 


Put C = AB. Let C’ be the transposed matrix of C. We have from (2), (4)-(7) 


(8) 2Tr(A*B’) + Tr((AB)’) = 2Tr(CC’) + Tr(C’) = 0. 


The left side of (8) is equal to > o3:-1 (c3; + cies; + c3;), which is positive un- 
less all c;; = 0 (i,j = 1, --- , m). Hence we have C = AB = 0, q.e.d. 

Corollary 1 follows from Theorem 1 and the theorem of Craig. Corollary 2 
results from observing that independence of Q, and Q, implies (2). 

B. Matérn proved, that if A, B are nonnegative, then AB = 0 follows from a 
unique condition F;, = 27Tr(AB) = 0. If only one of the matrices A, B is assumed 
to be nonnegative, we have 

THEOREM 2. Let A be nonnegative. Then from two conditions Fy, = 0, Fy 
= 0 in (2) follows the relation AB = 0. 

Proor. From (4), (5) follows Tr(AB’) = 0. Since A is nonnegative, we can 
choose a real symmetric matrix Ap such that A = Aj}. Put Cy = AoB. Then 
we have Tr(AB’) = Tr(C.Co) = 0 and from this follows Cy = 0. Hence we have 
also AB = A,Cy = 0, q.e.d. 


REFERENCES 


{1] A. T. Crare, ‘“‘Note on the independence of certain quadratic forms’’, Annals of Math. 
Stat., Vol. 14 (1943), pp. 195-197. 


[2] H. Horexuine, ‘Note on a matric theorem of A. T. Craig’’, Annals of Math. Stat., 
Vol. 15 (1944), pp. 427-429. 

[3] H. Sakamoto, “On the independence of two statistics”, Research Memoirs of Inst. of 
Stat. Math., Tokyo, Vol. 1 (1944), pp. 1-25 (in Japanese). 

[4] K. Marusita, ‘‘Note on the independence of certain statistics’’, Annals of Inst. of 
Stat. Math., Tokyo, Vol. 1 (1949), pp. 79-82. 


[5] J. Ocawa, ‘‘On the independence of bilinear and quadratic forms of a random sample 


from a normal population’’, Annals of Inst. of Stat. Math., Tokyo, Vol. 1 (1949), 
pp. 83-108. 


[6] B. Mariérn, ‘Independence of non-negative quadratic forms in normally correlated 
variables’, Annals of Math. Stat., Vol. 20 (1949), pp. 119-120. 


ERRATA TO “CONTROL CHART FOR LARGEST 
AND SMALLEST VALUES” 


By Joun M. Howe. 
Los Angeles City College 


In the paper cited in the title (Annals of Math. Stat., Vol. 20 (1949), p. 306), 
there are some numerical errors in Table I. Values of d2/2 and d, are given by 
H. J. Godwin in ‘Some Low Moments of Order Statistics’ in the same issue 





616 ABSTRACTS 


of the Annals. These values are more accurate than those heretofore available. 
A corrected Table I based on these values is as follows: 


n 2 ds A2 A3 As 





. 8256 1.8800 
7480 1.0233 
7012 7286 
.6690 .5768 
6449 .4832 
.6260 4193 
.6107 3725 
.5978 | .3367 
.5868 . 3083 


3.0411 
3.0902 
3.1330 
. 1699 
3.2020 
. 2303 
2556 
2784 
2992 


“ID ore W bo 
wow Nee 


feet tO OD 


oo OID Or P W DD 


oo 
W dw bd bd bo 
— 





I 


ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Berkeley meeting of the Institute, 
August 5, 1950) 


1. Sampling from Populations with Overlapping Clusters. Z. W. BrrnBavum, 
University of Washington, Seattle. 


In cluster sampling it is usually assumed that the clusters are disjoint. In this paper 
situations are considered in which this assumption is not fulfilled. Let the population z 
consist of N individuals “j”, having the variates V[j], 7 = 1, 2,--- , N, and let K clusters 
C[i],7 = 1,2,--- , K, be such that each “7” belongs to at least one cluster. Let s[j] > 1 
be the number of different clusters to which “j” belongs (the multiplicity of “j’’). The 
cluster C[i] contains N; individuals with the variates V[z, t], ¢ = 1, 2,---, Ni; 
~7=1,2,--- ,K. Ina sampling procedure, let sub-sample sizes n[i] be given for each C[i], 
and weights A[z, ¢] for each V[z, t]; a random sample of k clusters C[iu], uw = 1,2,--- ,k 
is obtained, then n[i.] individuals are sampled from C{i.], and for each of them its vari- 
ate and its multiplicity are recorded. Necessary and sufficient conditions are derived for 


n[i one ‘ ° ° ° 7 1 N ry 
S =>r, prlial V[iu , tv] Alix , to] being an unbiased estimate of V = N 55-10; . The 


variance of S is found, the weights are studied which minimize this variance, and some 
practically important special cases are derived. 


2. A Simple Nonparametric Test of Independence. Nits BLomavist, University 
of Stockholm. 


Consider a sample of size n from a two-dimensional distribution F(z, y). Let Z and 9 
denote the two sample medians and compute the number of individuals, say k, satisfying 
the inequality x < z, y < g (the trivial difficulty arising when 7 is an odd number can 
easily be overcome). A test of independence based on k is nonparametric. As a matter of 
fact one has under the null hypothesis that 


P(k) = (") /(). 
k m 





ABSTRACTS 617 


where m = |n/2]. In the case of normal F with correlation coefficient p it is possible to 
show, by studying the asymptotic behavior of the power function of the test in the neigh- 
borhood of p = 0, that the asymptotic efficienty of the test is (2/7)*, or about 41%. This 
result is based on the fact that k has an asymptotically normal distribution if some regu- 
larity conditions are fulfilled. In spite of its low efficiency it is suggested that the test be 
used in cases where some information can be neglected in favor of the simplicity of the 
method. 


3. On Minimax Statistical Decision Procedures and Their Admissibility. CoLin 
R. Buytu, University of California, Berkeley. 


The problem considered is that of using a sequence of observations on a random variable 
X to make a decision. Two loss functions W; and W2 , each depending on the distribution 
F of X, the number n of observations taken, and the decision 6 made, are assumed given. 
Minimax problems can be stated for weighted sums of W; and W:2 , or for either one subject 
to an upper bound on the expectation of the other. Under suitable conditions it is shown 
that solutions of the first type of problem provide solutions for all problems of the latter 
types, and that admissibility for a problem of the first type implies admissibility for prob- 
lems of the latter types. Two examples are given: estimation of FE X when X is (1) normal 
with known variance, (2) rectangular with known range. The two loss functions are in 
each case W,; = n and an arbitrary nondecreasing function W2(|5 — @|). Admissible 
minimax estimates are obtained. Extensions to any function W,(n) are indicated; two 
examples are given for the normal case where the sample size must be randomised among 
more than a consecutive pair of integers. 


4. Sufficient Statistics and Unbiased Estimates for “Selected” Distributions. 
Doveuas G. CHAPMAN, University of Washington, Seattle. 


A family of distributions obtained from any given family by fixed selection may be 
called a “selected” family. Tukey’s theorem that such selected families admit the same 
set of sufficient statistics as the parent family is proved for an extended class of distribu- 
tions. Further if the selection does not involve truncation the existence of minimum vari- 
ance unbiased estimates of parameters of the parent family ensures the existence of similar 
estimates for the selected family. Some results are derived for minimum variance unbiased 
estimates for truncated distributions. 


5. The Unattainability of Certain Lower Bounds by Product Densities. R. C. 
Davis, U. 8. Naval Ordnance Testing Station, China Lake. 


Under weak regularity conditions it is shown that for the case in which the sample size 
is a nonrandom variable, certain lower bounds are unattainable. Consider a univariate 
chance variable X, possessing an absolutely continuous distribution function F(z, 6), in 
which @ is the unknown parameter. Under quite general regularity conditions Barankin 
has proved the existence and uniqueness of the locally best unbiased estimate of a func- 
tion g(@) for a specified parameter value @) . The criterion of bestness is the minimization 
of the st® absolute central moment (s > 1) of the estimate about g(@.), and Barankin has 
obtained an expression for the lower bound both in the general case and in particular for 
a case which yields a generalization of the Cramer-Rao inequality valid for any s*® ab- 
solute central moment. It is the latter lower bound with which we are concerned. With 
an additional weak assumption concerning the density function of X, it is shown that if 
¢s(%1 , Z2,°** , Xn) is the locally best unbiased estimate of g(6) (obtained by Barankin) 
for each fixed sample size n and for each s > 1, then there exists no probability distribution 
F(x, 6) except for s = 2 yielding a sequence {g,(z; , t2 , -++ , 2n)}(n = 1, 2, --- , ad inf.) 
in which 2 , 22, --- , 2, are for each n independently and identically distributed chance 








618 ABSTRACTS 


variables and for which ¢,(2 , 22 , +++ , Zn) attains for each n the special lower bound given 
by Barankin. Obviously in the case s = 2, the lower bound is achieved by an efficient sta- 
tistic if one exists. 


6. A Note on the Power of the Sign Test. T. A. JEEvEs AND RosBert Ricwarps, 
University of California, Berkeley. 


Values obtained by using the normal approximation to the noncentral ¢-distribution 
given by Johnson and Welch were compared with exact values given by Neyman and 
Tokarska. The comparison indicated that efficiencies of the sign test computed from the 
approximation would be consistently higher than the true efficiencies. To avoid this bias 
the sign test was randomized so that levels of significance of a = .05 and a = .01 were 
obtained and the exact values of the noncentral ¢ used. Efficiencies were computed using 
various measures of equivalence of the power functions: (1) balancing the area (Walsh), 
(2) minimizing the maximum difference, (3) equalizing the power at certain fixed points. 
The various measures of equivalence yielded no marked differences in efficiencies. Tables 
were given of the efficiencies for small n. The efficiency for a = .05 was about .7 for n be- 
tween 6 and 20 and somewhat higher for a = .01. The efficiency slowly approaches the 
asymptotic value of 2/7 = .6366 as n increases. 


7. About Some Classes of Sequential Procedures for Obtaining Confidence 
Intervals of Given Length. (Preliminary Report). WERNER R. LEIMBACHER, 
University of California, Berkeley. 


The special class C, of such procedures indicated by A. Wald (Sequential Analysis, John 
Wiley & Sons, 1947, pp. 145-156) can be extended by generalizing and improving the in- 
equality on which the procedures are based. It is shown that even in this larger class C2 , 
a procedure could possibly be optimum only under very special circumstances. The well 
known optimum procedure for a normal distribution N(@, 1) can be obtained as the limit 
of a sequence of procedures from C2. For the suggested sequence, however, the limit no 
longer belongs to C2. In order to eliminate various deficiences of C2, a modified class C3 
is proposed which contains the well known optimum procedures for the normal and rec- 
tangular distributions. The method indicated seems suggestive for the general case of 
estimating location parameters by confidence intervals. 


8. On the Stochastic Independence of Symmetric and Homogeneous Linear 
and Quadratic Statistics. Eugene Luxacs, U. 8. Naval Ordnance Testing 
Station, China Lake. 


It is known that the sampling distributions of the mean and of the variance are stochas- 
tically independent if and only if the parent distribution is normal. This was proven by 
R. C. Geary (Jour. Roy. Stat. Soc., Suppl., Vol. 3 (1936)) and using a different method by 
E. Lukacs (Annals of Math. Stat., Vol. 13 (1942)). The question arises whether there are 
any distributions having the property that the sampling distributions of the mean and of a 
symmetric and homogeneous quadratic statistic are independent. It can be shown that 
there are only the following possibilities: (1) the parent distribution is normal, (2) the 
parent distribution is degenerate with a single saltus of one, (3) the parent distribution is 
a step function with two steps, located symmetrically with respect to zero, (4) the parent 
distribution is a gamma distribution. 


9. The Distribution of the Maximum Deviation between Two Sample Cumula- 
tive Step Functions. Frank J. Massey, Jr., University of Oregon. 


Let 11 < %2 < +++ < 2, and yi: < ys < +++ < Ym be the ordered results of two random 
samples from populations having continuous cumulative distribution functions F(z) and 


ABSTRACTS 619 


G(x) respectively. Let S,(2) = k/n when k is the number of observed values of X which 

are less than or equal to 2, and similarly let S!(y) = j/m where j is the number of observed 

values of Y which are less than or equal to y. The statistic d = max | S,(z) — S,.(z) | can 
z 


be used to test the hypothesis F(x) = G(x), where the hypothesis would be rejected if the 
observed d is significantly large. In this paper a method of obtaining the exact distribution 
of d for small samples is described, and a short table for equal size samples is included. 
The general technique is that used by the author for the single sample case. There is a 
lower bound to the power of the test against any specified alternative. This lower bound 
approaches one as n and m approach infinity proving that the test is consistent. 


10. An Iterative Construction of the Optimum Sequential Decision Procedure 
with Linear Cost Function. Lincotn E. Mosss, Stanford University. 


Where the cost of taking n observations is proportional to n, define a sequential decision 
procedure Dr by means of its associated “stopping region” 7’; T is the set of a posteriori 
probability distributions £(@) for which Dy instructs the statistician to take no observa- 
tion and to make the decision which minimizes the Bayes risk. Now let Dr be any sequen- 
tial decision procedure which has uniformly bounded average risk for every a priori dis- 
tribution, £(@). Define 7’ as the derived region of T: T’ is the set of £(@) such that the Bayes 
risk of stopping at £(@) is not greater than the risk of taking one observation and then 
using Dr . Define T(*+) = T(™’, Then it is shown that the sequence of regions {7} n = 
1,2, --+ is monotonically decreasing toa limit region 7’, and that D7 is the optimum se- 
quential decision procedure. Some numerical examples are given where the exact solution 
is obtained and the convergence of the iteration is examined. (This paper was prepared 
under the sponsorship of the Office of Naval Research.) 


11. On the Law of the Iterated Logarithm for Dependent Random Variables. 
SranteEy W. Nasu, University of California, Berkeley. 


The order of the remainder term is evaluated in the distribution function of the asymp- 
totically normal sum S, of dependent random variables of a certain class considered by 


Loéve. Bounds are found for the probability that max | S, | = Bir, where B, is the sum 
kin 


of the variances of components of S, . Given an infinite sequence of events A, , a nec- 
essary and sufficient condition is found for the probability that infinitely many A, 
occur to equal one. This criterion extends criteria due to Borel. With these results estab- 
lished, the law of the iterated logarithm is shown to hold for a wide subclass of Loéve’s 
class of dependent random variables. Within this class the partial sum S, — S; may ap- 
proach normality with a speed which depends in a certain functional way on the previous 
sum S; , and which may be arbitrarily slow for some values of S; . The conclusions gener- 
alize earlier results due to W. Doeblin and N. A. Sapogov. 


12. Conditional Expectation and the Efficiency of Estimates. Paut G. Hog, 
University of California, Los Angeles. 


A probability density function, f(z; @), is considered for which the range of z does not 
depend on @ and for which there exists a sufficient statistic for 6. It is shown that under 
certain regularity conditions, there exists a unique unbiased sufficient estimate of @ among 
those sufficient estimates which can be expressed as functions of a particular sufficient 
statistic. This result, together with results of other authors, is used to show that for the 
class of statistics satisfying the regularity conditions, the method of Blackwell for im- 
proving an unbiased estimate of @ does not yield an essentially better estimate than a well 
known estimate. 


















620 NEWS AND NOTICES 


13. Optimum Estimates for Location and Scale Parameters. Raymonp P. 
PETERSON, University of California and National Bureau of Standards, 
Los Angeles. 


Let hi(W | E, 0) = W(0; , 6)p(E | 6), where p(E | @) is the joint probability density 


function of the n (not necessarily independent) sample values 2 , --- , 2, which may be 
represented as a point H = (x, , --- , 2) in the n-dimensional Euclidean sample space M. 
The unknown parameters, 0, , --: , 0,, may be represented as a point 6 = (@:,--- , 0) 


in the s-dimensional Euclidean parameter space 2. W(0; , 0) is a real-valued, nonnega- 
tive, measurable weight function, defined for all Z in M and @ inQ, which represents the rela- 
tive seriousness of taking the estimate 6;(E) as the value of 0; for any particular sample 
point E. Let G(@) be the unknown cumulative distribution function of @. Then 6; (E) is 
defined to be a best estimate of 6; , provided that, if 6;(Z) is any other estimate of 6; in 
the class under consideration, J — [* > 0, where 


t« [I hi(W | E, 6)dE dG(@). 
QSM 
Let 


ri(8) = | h(W|E,0)dE,  ¢s(E) = / hi(W | B, @) do. 
M Q 


A general theorem is proved to the effect that if h:(W | Z, @) is measurable over the product 
space M X Q and if r;(@) and g;(E) are uniformly convergent integrals, then a best estimate 
6; (E) of 6; exists provided that 7;(6@) is constant and that 6; (E) minimizes ¢;(£) for all 
points H in M. General methods are obtained for constructing best estimates for location 
and scale parameters, separately or jointly, and for functions of location and scale param- 
eters from several populations. As special cases, results are derived which are analogous 
to converses of Theorems 1 and 2 in Kallianpur’s, “Minimax Estimates of Location and 
Scale Parameters”, Abstract, (Annals of Math. Stat., Vol. 21 (1950), pp. 310-311). 


(nn ta eR a ete 


NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest. 


Personal Items 


Professor William Feller of Cornell University has been appointed Eugene 
Higgins Professor of Mathematics at Princeton University. 

Dr. Leonard Kent, formerly on the staff at the University of Chicago in the 
School of Business, is now with the firm of Alderson and Sessions, 1905 Walnut 
Street, Philadelphia 3, Pennsylvania. 

Dr. G. B. Oakland has resigned an associate professorship of statistics at the 
University of Manitoba to accept the position as Head of Biometrics Unit, 
Division of Administration, Department of Agriculture, Ottawa. 

Dr. Norman Rudy has accepted an appointment as Assistant Professor at 
Sacramento State College, Sacramento, California. 

Professor G. R. Seth has returned to India to accept the position of Professor 
of Statistics and Deputy Statistical Advisor to the Indian Council of Agricultural 
Research, New Delhi. 










NEWS AND NOTICES 621 


Mr. Eric Wey], textile engineering consultant, formerly of Manchester, New 
Hampshire, has moved his office to 2509 Vail Avenue, Charlotte, North Carolina. 
Mr. Wey], a specialist in cotton spinning, serves as regular consultant to many 
leading textile mills. 


(a a Rn ra 


The completion and successful operation of SEAC—the National Bureau of 
Standards Eastern Automatic Computer—has been achieved by electronic scien- 
tists of the National Bureau of Standards. SEAC is a high-speed, general-purpose, 
automatically-sequenced electronic computer. It was developed and constructed, 
in a period of 20 months, by the staff of the National Bureau of Standards under 
the sponsorship of the Department of the Air Force to provide a high-speed 
computing service for Air Force Project SCOOP (Scientific Computation of 
Optimum Programs), a pioneering effort in the application of scientific principles 
to the large-scale problems of military management and administration. SEAC 
will also be available for solving important NBS problems of general scientific 
and engineering interest. 


(a ee ell a nem 


New Members 
The following persons have been elected to membership in the Institute 


(June 1, 1950 to August 31, 1950) 


Aven, Russell E., M.A. (Univ. of Miss.), Graduate student, University of Mississippi, 
1511 North Main St., Water Valley, Mississtppi. 

Bamberger, Gunter, Dip.-Math. (Univ. Gottingen), Division head in the Statistical Office 
of the City of Cologne, Manderscheider Platz 12, Cologne-Sulz, Germany. 

Bangdiwala, Ishver S., M.S. (Univ. N. C.), Graduate student, University of North Caro- 
lina, 210 A. Phillips Hall, University of North Carolina, Chapel Hill. 

Borch, Karl Henrik, M.Sc. (Oslo Univ.), Field Science Officer for Middle East, UNESCO, 
19 Avenue Kieber, Paris 16e, France. 

Buch, Kai R., M.Sc., Assistant Professor, Technical University of Denmark, Figaardsvej 
14 A?, Charlottenlund, Denmark. 

Carranza, Roque G., Ingeniero Industrial (Univ. Buenos Aires), Consultant Industrial 
Engineer, Parana 56, Buenos Aires, Argentina. 

Dominguez, Alberto G., Ph.D. (Univ. Buenos Aires), Professor of Mathematics, Facultad 
de Ciencias Exactas, Fisicas y Naturales, University of Buenos Aires, Paraguay 1327, 
Buenos Aires, Argentina. 

Dunaway, William L., B.S. (Univ. of Calif.), Graduate student, Dept. of Mathematical 
Statistics, University of California, 4820 Cahuenga Boulevard, North Hollywood, Cali- 
fornia. 

Fernandez, Jose J., Professor, University of Costa Rica, Ap. 1313, San Jose, Costa Rica. 

Fortet, Robert, Ph.D. (Paris), Professor, Department of Science de Caen, 168 Rue Capo- 
niere, Caen (Caloados), France. 

Geppert, Maria-Pia, Ph.D. (Univ. of Giessen), Lecturer, University of Frankfurt; Head 
of Statistical Laboratory, Kerckhoff-Institute, Bad Nauheim; Lecturer, Technical 
High School, Darmstadt, Germany. 

Gortler, J. Henry, Ph.D. (Univ. of Géttingen and Univ. of Giessen), Professor of Applied 
Mathematics and Dean of the Faculty of Natural Sciences and Mathematics, University 








622 REPORT OF BERKELEY MEETING 


of Freibrug i. Br.; Manager of ‘‘Gesellschaft fur angewandt Mathematik und Me- 
chanik”; Stadtstrasse 57, Freiburg i. Br., Germany. 

Guilbaud, George T., Agrege de 1 Univ. (Paris), Chief, Section a |’Institute of Science 
Economique Appliquee, Paris, and Professor, Institute of Statistics, University of 
Paris, 35 Boulevard des Capucines, Paris 2, France. 

Holloway, Clark, Jr., M.S. (Univ. of Ill.), Process Research Engineer, Gulf Research and 
Development Co., P.O. 2088, Pittsburgh 30, Pennsylvania. 

Lieberman, Gilbert, M.A. (Columbia Univ.), Mathematician, U.S. Naval Research Labora- 
tory, 220 Newcomb St., S.E., Washington 20, D.C. 

Lomax, K. S., M.A. (Manchester Univ.), Lecturer in Economic Statistics, Economics De- 
partment, The University, Manchester, England. 

Lorenz, Paul, Ph.D., Professor, University of Berlin, Kaiserstuhlstrasse 21, Berlin-Schlach- 
tensee, Germany. 

Lunger, George F., M.B.A. (Univ. of Mich.), Statistician, Great Lakes Investigations, Fish 
and Wildlife Service, Department of the Interior, 2110 Arbor View Blud., Ann Arbor, 
Michigan. 

Maggy, Robert K., M.A. (Univ. of Calif.), Graduate student, University of California, 
1685 Euclid Avenue, Berkeley 9, California. 

McElrath, Gayle W., M.S. (Univ. of Mich.), Assistant Professor, Department of Engineer- 
ing, 208 Main Engineering Building, University of Minnesota, Minneapolis, Minnesota. 

Neisius, W. Vincent, M.S. (Emory Univ.), Mathematics Instructor, Georgia Institute of 
Technology, 597 St. Charles Avenue, N.E., Atlanta &, Georgia. 

Perloff, Robert, M.A. (Ohio State Univ.), Graduate student and Research Assistant, Re- 
search Foundation, Ohio State University, 1281 Bryden Road, Columbus 6, Ohio. 
Peter, Hans, Dr. rer. pol., Professor of Economics, University of Titbingen, T'ubingen- 

Waldhausen 29, Germany. 

Putter, Joseph, M.Sc. (Hebrew Univ., Jerusalem), International House, Berkeley 4, Cali- 
fornia. 

Rankin, Bayard, A.B. (Univ. of Calif.), Graduate student, University of California, Inter- 
national House, Berkeley 4, California. 

Reid, Albert T., B.S. (Iowa State College), Research Assistant in Mathematical Biology, 
Committee on Mathematical Biology, University of Chicago, 5741 Drexel Avenue, 
Chicago 37, Illinois. 

Shaw, Albert, B.S. (Univ. of Alberta), Lecturer, University of Alberta, Department of 
Mathematics, University of Alberta, Edmonton, Alberta, Canada. 

Shuhany, Elizabeth, A.M. (Boston Univ.), Assistant Instructor in Statistics and Assistant 
in Statistical Laboratory of Mathematics, Boston University, 725 Commonwealth 
Avenue, Boston 15, Massachusetts. 

Stewart, John N., B.A. (Univ. of Michigan), Graduate student, University of Michigan, 
4834 Chatsworth, Detroit 24, Michigan. 

Strecker, Heinrich, Doctor der Naturwissenschaften (Univ. Munchen), Mathematical 
Statistician in the Bavarian Statistical Office, Rosenheimerstrasse 130, Munich 8, 
Germany. 

Vaswani, Sundri (Miss) Ph.D. (Univ. of London), Research Associate in Statistics, c/o 
Ahmedabad Textile Industry’s Research Association, P.O. Box 170, Ahmedabad, India. 


(I 


REPORT OF THE BERKELEY MEETING OF THE INSTITUTE 


The forty-fourth meeting of the Institute of Mathematical Statistics was 
held on August 5, 1950, on the Berkeley campus of the University of California, 
in conjunction with the Second Berkeley Symposium on Mathematical Statistics 


a oe a =—l 


pe 
of 


REPORT OF BERKELEY MEETING 623 


and Probability which met from July 31 through August 12. Other organizations 
cooperating with the Symposium were the Biometrics Section of the American 
Statistical Association, The Western North American Region of the Biometric 
Society, the Econometric Society, the Institute of Transportation and Traffic 
Engineering of the University of California, and the Office of Naval Research. 
Some 218 persons registered for the Symposium, including the following 106 
members of the Institute: 


T. W. Anderson, Fred C. Andrews, Jane F. Andrian, Kenneth J. Arrow, Edward W. 
Barankin, Helen P. Beard, Robert D. Bedwell, Blair M. Bennett, Joseph Berkson, Z: W. 
Birnbaum, David Blackwell, E. Blanco, Nils Blomqvist, Julius R. Blum, Colin R. Blyth, 
A. H. Bowker, George W. Brown, Douglas G. Chapman, C. L. Chiang, K. L. Chung, William 
G. Cochran, Harald Cramér, Edwin L. Crow, J. H. Curtiss, R. C. Davis, W. J. Dixon, J. L. 
Doob, A. Dvoretzky, Mary Elveback, Benjamin Epstein, Mark W. Eudey, Edward A. Fay, 
William Feller, Edgar H. Fickenscher, E. Fix, William R. Gaffey, Robert S. Gardner, 8. G. 
Ghurye, M.A. Girshick, Paul Gutt, Jack C. Gysbers, T. E. Harris, J. L. Hodges, Jr., Wassily 
Hoeffding, Paul G. Hoel, Harold Hotelling, John M. Howell, Harry M. Hughes, R. F. 
Jarrett, T. A. Jeeves, Mark Kac, Joseph Kampé de Fériet, E. 8S. Keeping, Ryoichi Kikuchi, 
Wilfred M. Kincaid, H. 8S. Konijn, Charles H. Kraft, George M. Kuznets, E. L. Lehmann, 
Roy B. Leipnik, Paul Levy, M. Loéve, Arvid T. Lonseth, Eugene Lukacs, C. A. Magwire, 
Jacob Marschak, Thomas Marschak, F. J. Massey, Jr., A. M. Mood, Lincoln E. Moses, 
James T. MeWilliam, Stanley W. Nash, J. Neyman, Howard C. Nielson, Gottfried E. 
Noether, Stefan Peters, John C. Petersen, Raymond P. Peterson, Robert I. Piper, Joseph 
Putter, Robert R. Putz, Bayard Rankin, Fred D. Rigby, David Rubinstein, Elizabeth L. 
Scott, Esther Seiden, Arthur Shapiro, Richard H. Shaw, Ronald W. Shephard, W. B. Simp- , 
son, Monroe Sirken, M. Sobel, Herbert Solomon, A. L. Stewart, Donald E. Stiling, G. 
Szego, Robert Tate, William F. Taylor, Leo J. Tick, A. W. Tucker, Elizabeth Vaughan, 
Shanti A. Vora, Abraham Wald, Allen Wallis, J. Wolfowitz, Miriam L. Yevick. 


Because of the extensive program of more than fifty invited addresses at the 
Symposium, the Institute meeting was devoted only to contributed papers. 
Professor David Blackwell of Howard and Stanford Universities presided at 
the Institute meeting, at which the following program was presented: 


1. Sampling from Populations with Overlapping Clusters. Z. W. Birnbaum, University of 
Washington, Seattle. 

2. A Simple Nonparametric Test of Independence. Nils Blomqvist, University of Stock- 
holm. 

3. On Minimaz Statistical Decision Procedures and their Admissibility. Colin R. Blyth, 
University of California, Berkeley. 

4. Sufficient Statistics and Unbiased Estimates for ‘‘Selected” Distributions. Douglas G. 
Chapman, University of Washington, Seattle. 

5. The Unattainability of Certain Lower Bounds by Product Densities. R. C. Davis, U. 8. 
Naval Ordnance Testing Station, China Lake. 

6. A Note on the Power of the Sign Test. T. A. Jeeves and Robert Richards, University 
of California, Berkeley. 

7. About Some Classes of Sequential Procedures for Obtaining Confidence Intervals of Given 
Length. (Preliminary report). Werner R. Leimbacher, University of California, Berkeley. 

8. On the Stochastic Independence of Symmetric and Homogeneous Linear and Quadratic 


Statistics. Eugene Lukacs, U. 8. Naval Ordnance Testing Station, China Lake. 





624 REPORT OF BERKELEY MEETING 


9. The Distribution of the Maximum Deviation between Two Sample Cumulative Step Func- 
tions. Frank J. Massey, Jr., University of Oregon. 

10. An Iterative Construction of the Optimum Sequential Decision Procedure with Linear 
Cost Function. Lincoln E. Moses, Stanford University. 

11. On the Law of the Iterated Logarithm for Dependent Random Variables. Stanley W. 
Nash, University of California, Berkeley. 

12. Conditional Expectation and the Efficiency of Estimates. (By title). Paul G. Hoel, 
University of California, Los Angeles. 

13. Optimum Estimates for Location and Scale Parameters. (By title). Raymond P. Peter- 
son, University of California and National Bureau of Standards, Los Angeles. 


The social activities at the Symposium included a tea on August 1, an excur- 
sion on August 3, a dinner on August 7, a picnic on August 9, and coffee on 
July 31 and August 2, 4, 7, 8, 10, and 11. 


J. L. Hopass, Jr. 
Associate Secretary 











THE ANNALS 
of 
MATHEMATICAL 
STATISTICS 


(FOUNDED BY H. C. CARVER) 


Tue OFFICIAL JOURNAL OF THE INSTITUTE OF 
ATHEMATICAL STATISTICS 


INDEX OF PAPERS 


Author Index 
Subject Index 


Volumes I-XX 
1930--1949 . 





+ et THE ANNALS 
OF MATHEMATICAL STATISTICS 


Ob EDITED BY 
. 8S. S. WILKS, Editor 
M. 8. BARTLETT HARALD CRAMER J. NEYMAN 

W. EDWARDS DEMING WALTER A. SHEWHART 
J. L. DOOB JOHN W. TUKEY 
W. FELLER A. WALD 

HAROLD HOTELLING 

WITH THE COOPERATION OF 


T. W. ANDERSON, Jr. CHURCHILL EISENHART H. B. Mann 

Davip BLACKWELL M. A. GrrsHick ALEXANDER M. Moop 

J. H. Curtiss Paut R. Hatmos FREDERICK MosTELLER 

J. F. Day Paut G. Hore. H. E. Rossins 

Harotp F. Dongs Marx Kac Henry Scuerrh 

Pau. S. Dwyer E. L. LEHMANN Jacos WoLrowITz 
Wiii1am G. Mapow 


The ANNALS OF MATHEMATICAL Statistics is published quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Aves., Baltimore 2, 
Md. Subscriptions, renewals, orders for back numbers and other business com- 
munications should be sent to the ANNALS OF MaTHEMATICAL Statistics, Mt. 
Royal & Guilford Aves., Baltimore 2, Md., or to the Secretary of the Insti- 
tute of Mathematical Statistics, C. H. Fischer, Business Administration Build- 
ing, University of Michigan, Ann Arbor, Mich. 


Changes in mailing address which are to become effective for a given issue 
should be reported to the Secretary on or before the 15th of the month preceding 
the month of that issue. The months of issue are March, June, September and 
December. 


Manuscripts for publication in the ANNALS OF MATHEMATICAL STATISTICS 
after December 31, 1949 should be sent to T. W. Anderson, Department of 
Mathematical Statistics, Columbia University, New York 27, New York. Man- 
uscripts should be typewritten double-spaced with wide margins, and the orig- 
inal copy should be submitted. Footnotes should be reduced to a minimum and 
whenever possible replaced by a bibliography at the end of the paper; formulae in 
footnotes should be avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cloth in black India ink twice the size they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 


Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covers furnished at cost. 


The subscription price for the ANNALS is $10.00 inside the Western Hemi- 
sphere and $5.00 elsewhere. Single copies $3.00. Back numbers are available 
at $10.00 per volume or $3.00 per single issue. 


CoMPOSED AND PRINTED AT THE 


WAVERLY PRESS, Inc. 
Bautmmore, Mp., U. S. A. 


Entered as second-class matter at the Post Office at Baltimore, Maryland, under the act of March 3, 1879. 








ooo a vn 





AUTHOR INDEX 


The number in bold face type following the title of the paper indicates the volume number, and the number fol- 
lowing the colon indicates the beginning page. 


Abernethy, J. R. On the Elimination of 
Systematic Errors Due to Grouping. 
4 :263 
Albert, G. E. A Note on the Fundamental 
Identity of Sequential Analysis. 18 :593 
Correction to ‘‘A Note on the Fun- 
damental Identity of Sequential Anal- 
ysis.’’ 19 :426 
Alter, D. A Simple Form of Periodogram. 
8 :121 
——— Correction of Sample Movement 
Bias Due to Lack of High Contact and 
to Histogram Grouping. 10 :192 
Anderson, P. H. Distributions in Stratified 
Sampling. 13 :42 
Anderson, R. L. Distribution of the Serial 
Correlation Coefficient. 13:1 
Anderson, T. W. See Villars, D. S. 14:141 
On Card Matching. 14:426 
and Girshick, M. A. Some Exten- 
sions of the Wishart Distribution. 
15 :345 
The Non-Central Wishart Distribu- 
tion and Certain Problems of Multi- 
variate Statistics. 17:409 
and Rubin, H. Estimation of the 
Parameters of a Single Equation in a 
Complete System of Stochastic Equa- 
tions. 20 :46 
Andersson, W. On a New Method of Com- 
puting Non-Linear Regression Curves. 
5:81 
Andrews, F. C., and Birnbaum, Z. W. On 
Sums of Symmetrically Truncated 
Normal Random Variables. 20 :458 
Aroian, L. A. The Type B Gram-Charlier 
Series. 8 :183 
Continued Fractions for the Incom- 
plete Beta Function. 12:218 
A Study of R. A. Fisher’s z-Distribu- 
tion and the Related F-Distribution. 
12 :429 
A New Approximation to the Levels 
of Significance of the Chi-Square Dis- 
tribution. 14:93 


The Probability Function of the 
Product of Two Normally Distributed 
Variables. 18 :265 

The Fourth Degree Exponential 
Distribution Function. 19:589 

Bacon, H. M. Note on a Formula for the 
Multiple Correlation Coefficient. 9 :227 

— A Matrix Arising in Correlation 
Theory. 19 :422 

Baier, R. Sampling from a Changing Popu- 
lation. 16 :348 

Bailey, J. L., Jr. A Table to Facilitate the 
Fitting of Certain Logistic Curves. 
2 3355 

Baker, G. A. Distribution of the Means of 
Samples of n Drawn at Random from a 
Population Represented by a Gram- 
Charlier Series. 1 :199 

Transformations of Bimodal Dis- 
tributions. 1 :334 

The Relation between the Means 
and Variances, Means Squared and 
Variances in Samples from Combina- 
tions of Normal Populations. 2 :333 

Distribution of the Means Divided 
by the Standard Deviations of Samples 
from Non-Homogeneous Populations. 
3:1 

Transformation of Non-Normal Fre- 
quency Distributions into Normal Dis- 
tributions. 6:113 

Note on the Distributions of the 
Standard Deviations and Second Mo- 
ments of Samples from a Gram-Charlier 
Population. 6 :127 

The Probability That the Mean of 
a Second Sample Will Differ from the 
Mean of a First Sample by Less Than a 
Certain Multiple of the Standard Devi- 
ation of the First Sample. 6:197 

Correlation Surfaces of Two or 
More Indices When the Components of 
the Indices Are Normally Distributed. 
8 :179 

Comparison of Pearsonian Approxi- 








2 INDEX OF PAPERS 


mations with Exact Sampling Distribu- 

tions of Means and Variances in Samples 

from Populations Composed of the 

Sums of Normal Populations. 11 :219 

Tests of Homogeneity for Normal 

Populations. 12 :233 

Distribution of the Ratio of Sample 

Range to Sample Standard Deviation 

for Normal and Combinations of Nor- 

mal Distributions. 17 :366 

The Variance of the Proportions of 
Samples Falling within a Fixed Interval 
for a Normal Population. 20:123 

Bancroft, T. A. On Biases in Estimation 
Due to the Use of Preliminary Tests of 
Significance. 15 :190 

Note on an Identity in the Incom- 

plete Beta Function. 16 :98 

Some Recurrence Formulae in the 
Incomplete Beta Function Ratio. 20 :451 

Banerjee, K. S. Weighing Designs and 
Balanced Incomplete Blocks. 19 :394 

A Note on Weighing Design. 20 :300 

Barankin, E. W. Locally Best Unbiased 
Estimates. 20 :477 

Barkey, I. H. On the Finite Differences of 
a Polynomial. 6:131 

Bartky, W. Multiple Sampling with Con- 
stant Probability. 14:363 

Bartlett, M. S. Complete Simultaneous 
Fiducial Distributions. 10 :129 

——— The General Canonical Correlation 

Distribution. 18:1 

Baten, W. D. Simultaneous Treatment of 
Discrete and Continuous Probability 
by Use of Stieltjes Integrals. 1 :95 

Correction for the Moments of a 

Frequency Distribution in Two Vari- 

ables. 2 :309 

Combining Two Probability Func- 
tions. 5:13 

Battin, I. L. On the Problem of Multiple 
Matching. 13 :294 

Beale, F. S. On the Polynomials Related to 
the Differential Equation 
1 dy a) + az N 
ydx bo + bit + bez? ~ 2 5206 

On Certain Class of Orthogonal 
Polynomials. 12:97 

Beckenbach, E. F. Convexity Properties of 
Generalized Mean Value Functions. 
13 :88 



































Bellinson, H. R. See von Neumann, J. 
12 :153 

Belz, M. H. Note on the Liapounoff Ine- 
quality for Absolute Moments. 18 :604 

Berger, A. and Wald, A. On Distinct Hy- 
potheses. 20 :104 

Berkson, J. Bayes’ Theorem. 1:42 

Bernstein, F. Regression and Correlation 
Evaluated by a Method of Partial 
Sums. 8:77 

Berry, C. E. A Criterion of Convergence for 
the Classical Iterative Method of 
Solving Linear Simultaneous Equa- 
tions. 16 :398 

Berstein, S. (Translated by E. Lehmer). 
Solution of a Mathematical Problem 
Connected with the Theory of Heredity. 
13 :53 

Birnbaum, Z. W. An Inequality for Mills’ 
Ratio. 13 :245 

and Zuckerman, H. S. An Inequality 

Due to H. Hornich. 15 :328 

Raymond, J.,and Zuckerman, H. S. 

A Generalization of Tchebycheff’s Ine- 

quality to Two Dimensions. 18:70 

On Random Variables with Com- 

parable Peakedness. 19:76 

and Zuckerman, H. S. A Graphical 

Determination of Sample Size for 

Wilks’ Tolerance Limits. 20:313 

See Andrews, F. C. 20 :458 

Bishop, M. C. A Note on Computation for 
Analysis of Variance. 10 :393 

Blackwell, D. On an Equation of Wald. 
17 :84 

and Girshick, M. A. On Functions 

of Sequences of Independent Chance 

Vectors with Applications to the Prob- 

lem of the “Random Walk” in k Di- 

mensions. 17 :310 

Conditional Expectation and Un- 
biased Sequential Estimation. 18:105 

—— and Girshick, M. A. A Lower Bound 
for the Variance of Some Unbiased 
Sequential Estimates. 18 :277 

Bleick, W. E. A Least Squares Accumula- 
tion Theorem. 11 :225 

Bliss, C. I. An Experimental Design for 
Slope-Ratio Assays. 17 :232 

See Cochran, W. G. 19:151 

Blom, C. A Generalization of Wald’s Fun- 
damental Identity. 20:439 


























2 AR TR 


EE IRL TERRE OE ET pee BD IIE Si 


a 


ety oaey 


YOK Se» 











2A SRO Satire DE cer 





INDEX OF PAPERS 3 


Boas, R. P., Jr. Representation of Proba- 
bility Distributions by Charlier Series. 
20 :376 

Boldyreff, J. W. Mathematical Foundation 
for a Method of Statistical Analysis of 
Household Budgets. 5 :216 

Bortkiewicz, L. v. The Relation between 
Stability and Homogeneity. 2:1 

Bose, R. C. A Note on Fisher’s Inequality 
for Balanced Incomplete Block De- 
signs. 20 :619 

Bowker, A. H. Note on Consistency of a 
Proposed Test for the Problem of Two 
Samples. 15 :98 

Computation of Factors for Toler- 

ance Limits on a Normal Distribution 

When the Sample Is Large. 17 :238 

On the Norm of a Matrix. 18 :285 

Brennan, J. F. See Housner, G. W. 19 :380 

Bridger, C. A. Note on Regression Func- 
tions in the Case of Three Second- 
Order Random Variables. 9 :309 

Bronowski, J., and Neyman, J. The Vari- 
ance of the Measure of a Two Dimen- 
sional Random Set. 16 :330 

Brookner, R. J. See Wald, A. 12:137 

Choice of One among Several Sta- 
tistical Hypotheses. 16 :221 

Brown, A. W. A Note on the Use of a Pear- 
son Type III Function in Renewal 
Theory. 11 :448 

Brown, G. M. On Sampling from Compound 
Populations. 4 :288 

Brown, G. W. On the Power of the L test 
for Equality of Several Variances. 
10 :119 

Reduction of a Certain Class of 

Composite Statistical Hypotheses. 11: 

254 

















and Tukey, J. W. Some Distributions 

of Sample Means. 17:1 

Discriminant Functions. 18 :514 

On Small-Sample Estimation. 18 :582 

Burgess, R. W. A Statistical Approach to 
Mathematical Formulation of Demand- 
Supply-Price Relationships. 3 :10 

Burr, I. W. Cumulative Frequency Func- 
tions. 13 :215 

Byrne, L. A Theory of Validation for Deriv- 
ative Specifications and Check Lists. 
6 3146 

Byron, F. H. The Point Binomial and Prob- 
ability Paper. 6:21 












































Camp, B. H. Methods of Obtaining Prob- 
ability Distributions. 8:90 

Notes on the Distribution of the 

Geometric Mean. 9:221 

Some Recent Advances in Mathe- 

matical Statistics, I. 13 :62 

The Effect on a Distribution Func- 

tion of Small Changes in the Population 

Function. 17 :226 

Generalization to N Dimensions of 
Inequalities of the Tchebycheff Type. 
19 :568 

Camp, C. C. Note on the Numerical Evalua- 
tion of Double Series. 8:72 

Carlson, J. L. A Study of the Distribution 
of Means Estimated from Small Samples 
by the Method of Maximum Likelihood 
for Pearson’s Type II Curve. 3:86 

Carlton, G. A. Estimating the Parameters 
of a Rectangular Distribution. 17 :355 

Carver, H. C. Fundamentals of the Theory 
of Sampling. 1:101 

Fundamentals of the Theory of 

Sampling. 1 :260 

The Interdependence of Sampling 

and Frequency Distribution Theory. 

2 82 

Trapezoidal Rule for Computing 

Seasonal Indices. 3 :361 

Note on the Computation and Modi- 

fication of Moments. 4:229 

A New Type of Average for Security 

Prices. 5:73 

Punched Card Systems and Sta- 

tistics. 5 :153 

The Fundamental Nature and Proof 
of Sheppard’s Adjustments. 7 :154 

Castagnetto, L. See Cernuschi, F. 17:53 

See Cernuschi, F. 18 :122 

Cernuschi, F., and Castagnetto, L. Chains 
of Rare Events. 17 :53 

Probability Schemes with 
Contagion in Space and Time. 18:122 

Chapman, D. W. The Generalized Problem 
of Correct Matchings. 6:85 

Chernoff, H. Asymptotic Studentization 
in Testing of Hypotheses. 20 :268 

Chung, K. L. On the Probability of the 
Occurrence of at Least m Events among 
n Arbitrary Events. 12 :328 

On Mutually Favorable Events. 

13 :338 














4 INDEX OF PAPERS 


Chung, K. L. Generalization of Poincaré’s 
Formula in the Theory of Probability. 
14:63 

On Fundamental Systems of Prob- 
abilities of a Finite Number of Events. 
14 :123 

Further Results on Probabilities of 
a Finite Number of Events. 14 :234 

and Hsu, L. C. A Combinatorial 
Formula and Its Application to the 
Theory of Probability of Arbitrary 
Events. 16:91 

The Approximate Distribution of 
Student’s Statistic. 17 :447 

On a Lemma by Kolmogoroff. 19:88 

Churchill, E. Information Given by Odd 
Moments. 17 :244 

Churchman, C. W. See Epstein, B. 15:90 

Clark, C. E. Note on the Binomial Distribu- 
tion. 8:116 

Cochran, W. G. Note on an Approximate 
Formula for Significance Levels of z. 
11 :93 

The Analysis of Variance When 
Experimental Errors Follow the Poisson 
or Binomial Laws. 11 :335 

The Comparison of Different Scales 
of Measurement for Experimental Re- 
sults. 14:205 

Relative Accuracy of Systematic 
and Stratified Random Samples for a 
Certain Class of Populations. 17 :164 

and Bliss, C. I. Discriminant Func- 
tions with Covariance. 19:151 

Cohen, A. C., Jr. The Numerical Computa- 
tion of the Product of Conjugate 
Imaginary Gamma Functions. 11:213 

Coleman, J. B. A Coefficient of Linear Corre- 
lation Based on the Method of Least 
Squares and the Line of Best Fit. 3:79 

Copeland, A. H. The Theory of Probability 
from the Point of View of Admissible 
Numbers. 3 :143 

Court, L. M. A Reciprocity Principle for the 
Neyman-Pearson Theory of Testing 
Statistical Hypotheses. 15 :326 

Cox, G. M. Enumeration and Construction 
of Balanced Incomplete Block Con- 
figurations. 11:72 

Craig, A. T. The Simultaneous Distribution 
of Mean and Standard Deviation in 
Small Samples. 3 :126 


On the Correlation between Certain 
Averages from Small Samples. 4:127 
—— On the Independence of Certain 
Estimates of Variance. 9:48 
——— On the Mathematics of the Repre- 
sentative Method of Sampling. 10 :26 
——— A Note on the Best Linear Esti- 
mate. 14:88 
Note on the Independence of Cer- 
tain Quadratic Forms. 14:195 
Bilinear Forms in Normally Corre- 
lated Variables. 18 :565 
Craig, C. C. Note on the Distribution of 
Means of Samples of N Drawn from a 
Type A Population. 2 :99 
On a Property of the Semi-invari- 
ants of Thiele. 2 :154 
Sampling in the Case of Correlated 
Observations. 2 :324 
—— On the Tchebycheff Inequality of 
Bernstein. 4:94 
Sheppard’s Corrections for a Dis- 
crete Variable. 7:55 
The Product Semi-invariants of the 
Mean and a Central Moment in Samples. 
11 :177 
Note on the Distribution of Non- 
Central ¢ with an Application. 12 :224 
A Note on Sheppard’s Corrections. 
12 :339 
Some Recent Advances in Mathe- 
matical Statistics, II. 13:74 
Cramer, G. F. An Approximation to the 
Binomial Summation. 19 :592 
Cramér, H. Problems in Probability Theory. 
18 :165 
Curtiss, J. H. On the Distribution of the 
Quotient of Two Chance Variables. 
12 :409 
A Note on the Theory of Moment 
Generating Functions. 13 :430 
On Transformations Used in the 
Analysis of Variance. 14:107 
A Note on Some Single Sampling 
Plans Requiring the Inspection of a 
Small Number of Items. 17 :62 
Daly, J. F. See Wilks, S. S. 10 :225 
On the Unbiased Character of Like- 
lihood-Ratio Tests for Independence 
in Normal Systems. 11:1 
A Problem in Estimation. 12 :459 
On the Use of the Sample Range in 
an Analogue of Student’s t-Test. 17:71 





INDEX OF PAPERS 5 


Dantzig, G. B. On a Class of Distributions 
That Approach the Normal Distribu- 
tion Function. 10 :247 

On the Non-Existence of Tests of 
“Student’s’”’ Hypothesis Having Power 
Functions Independent of ¢. 11:186 

David, H. T. A Note on Random Walk. 
20 :603 

Davis, H. T. Polynomial Approximation by 
the Method of Least Squares. 4:155 

DeLury, D. B. Note on Correlations. 9:149 

Deming, W. E., and Stephan, F. F. On a 
Least Squares Adjustment of a Sampled 
Frequency Table When the Expected 
Marginal Totals Are Known. 11:427 

Discussion of Professor Hotelling’s 
Paper. 11 :470 

Derksen, J. B. D. On Some Infinite Series 
Introduced by Tschuprow. 10 :380 

Dieulefait, C. E. Note on a Method of 
Sampling. 13 :94 

Dixon, W. J. A Criterion for Testing the 
Hypothesis That Two Samples Are from 
the Same Population. 11 :199 

Further Contributions to the Prob- 
lem of Serial Correlation. 15 :119 

Table of Normal Probabilities for 
Intervals of Various Lengths and Loca- 
tions. 19 :424 

Dodd, E. L. The Use of Linear Functions to 
Detect Hidden Periods in Data Sepa- 
rated into Small Sets. 1:205 

Internal and External Means Arising 
from the Scaling of Frequency Func- 
tions. 8:12 

Interior and Exterior Means Ob- 
tained by the Method of Moments. 9 :153 

The Length of the Cycles Which 
Result from the Graduation of Chance 
Elements. 10 :254 

The Substitutive Mean and Certain 
Subclasses of This General Mean. 11 :163 

The Cyclic Effects of Linear Gradu- 
ations Persisting in the Differences of 
the Graduated Values. 12:127 

Some Generalizations of the Log- 
arithmic Mean and of Similar Means of 
Two Variates Which Become Indeter- 
minate When the Two Variates Are 
Equal. 12 :422 

Dodd, S. C. The Standard Error of a ‘‘Social 
Force’’. 7 :202 


Dodge, H. F. A Sampling Inspection Plan 
for Continuous Production. 14 :264 
Doob, J. L. The Limiting Distributions of 

Certain Statistics. 6:160 
Probability as Measure. 12 :206 
See von Mises, R. 12:215 
——— The Elementary Gaussian Processes. 
15 :229 
Heuristic Approach to the Kolmo- 
gorov-Smirnov Theorems. 20 :393 
Dorfman, R. The Detection of Defective 
Members of Large Populations. 14 :436 
Dressel, P. L. Statistical Semi-invariants 
and Their Estimates with Particular 
Emphasis on Their Relation to Alge- 
braic Invariants. 11:33 
A Symmetric Method for Obtaining 
Unbiased Estimates and Expected 
Values. 12:84 
Dunlap, H. F. An Empirical Determination 
of the Distribution of Means, Standard 
Deviations, and Correlation Coeffi- 
cients Drawn from Rectangular Popu- 
lations. 2 :66 
Dwyer, P. S. Moments of Any Rational 
Integral Isobaric Sample Moment Func- 
tion. 8:21 
— The Simultaneous Computation of 
Groups of Regression Equations and 
Associated Multiple Correlation Coeffi- 
cients. 8 :224 
Combined Expansions of Products 
of Symmetric Power Sums and of Sums 
of Symmetric Power Products with 
Application to Sampling. 9:1 
Combined Expansions of Products 
of Symmetric Power Sums and of Sums 
of Symmetric Power Products with 
Application to Sampling. (Continued). 
9:97 
The Computation of Moments with 
the Use of Cumulative Totals. 9:288 
The Cumulative Numbers and Their 
Polynomials. 11 :66 
Combinatorial Formulas for the 
rth Standard Moment of the Sample 
Sum, of the Sample Mean, and of the 
Normal Curve. 11 :353 
The Skewness of the Residuals in 
Linear Regression Theory. 12 :104 
—— The Doolittle Technique. 12 :449 
Grouping Methods. 13 :138 





6 INDEX OF PAPERS 


Dwyer, P. S. A Matrix Presentation of 
Least Squares and Correlation Theory 
with Matrix Justification of Improved 
Methods of Solution. 15 :82 

See Waugh, F. V. 16 :259 

and MacPhail, M. S. 
Matrix Derivatives. 19:517 

Pearsonian Correlation Coefficients 
Associated with Least Squares Theory. 
20 :404 

Dvoretzky, A. On the Strong Stability of a 
Sequence of Events. 20 :296 

Eisenhart, C. The Interpretation of Certain 
Regression Methods and Their Use in 
Biological and Industrial Research. 
10 :162 

A Note on A Priori Information. 
10 :390 
See Swed, F. S. 14:66 

Enlow, E. R. Quadrature of the Normal 
Curve. 56:136 

Epstein, B., and Churchman, C. W. On the 
Statistics of Sensitivity Data. 15:90 

Some Applications of the Mellin 
Transform in Statistics. 19 :370 

A Modified Extreme Value Problem. 
20 :99 

The Distribution of Extreme Values 
in Samples Whose Members Are Subject 
to a Markoff Chain Condition. 20 :590 

Erdés, P. On a Theorem of Hsu and Rob- 
bins. 20 :286 

Evans, W. D. Note on the Moments of a 
Binomially Distributed Variate. 11 :106 

Ezekiel, M. The Sampling Variability of 
Linear and Curvilinear Regressions. 
1:275 

Feldman, H. M. The Distributions of the 
Precision Constant and Its Square in 
Samples of n from a Normal] Population. 
3 :20 

Mathematical Expectation of Prod- 
uct Moments of Samples Drawn from a 
Set of Infinite Populations. 6 :30 

Feller, W. On the Integral Equation of 

Renewal Theory. 12 :243 

On a General Class of ‘‘Contagious”’ 
Distributions. 14:389 

Note on the Law of Large Numbers 
and ‘‘Fair’’ Games. 16 :301 

On the Normal Approximation to 
the Binomial Distribution. 16 :319 


Symbolic 


—— On the Kolmgorov-Smirnov Limit 
Theorems for Empirical Distributions. 
19:177 

Ferris, C. D., Grubbs, F. E., and Weaver, 
C. L. Operating Characteristics for the 
Common Statistical Tests of Signifi- 
eance. 17:178 

Fertig, J. W. On a Method of Testing the 
Hypothesis That an Observed Sample 
of n Variables and of Size N Has ~ 
Been Drawn from a Specified Pop- 
ulation of the Same Number of Var- 
iables. 7 :113 

and Proehl, E. A. A Test of a Sample 
Variance Based on Both Tail Ends of 
the Distribution. 8 :193 

Finney, D. J. The Frequency Distribution 
of Deviates from Means and Regression 
Lines in Samples from a Multivariate 
Norma] Population. 17 :344 

Fischer, C. H. On Correlation Surfaces of 
Sums with a Certain Number of Ran- 
dom Elements in Common. 4:103 

On Multiple and Partial Correlation 
Coefficients of a Certain Sequence of 
Sums. 4:278 

A Sequence of Discrete Variables 
Exhibiting Correlation Due to Common 
Elements. 13 :97 

Fisher, R. A. A Note on Fiducial Inference. 
10 :383 

Forsyth, C. H. Some Simple Developments 
in the Use of the Coefficient of Stability. 
8:5 

Frankel, A. See Kullback, S. 11 :209 

Frankel, L. R. See Hotelling, H. 9:87 

—— See Stock, J. S. 10:288 

Fréchet, M. Definition of the Probable 
Deviation. 18:288 

——— The General Relation between the 
Mean and Mode for a Discontinuous 
Variate. 18 :290 

Friedman, M. A Comparison of Alternative 
Tests of Significance for the Problem 
of m Rankings. 11:86 

Garver, R. Concerning the Limits of a Meas- 
ure of Skewness. 3 :358 

Geiringer, H. On the Probability Theory of 
Arbitrarily Linked Events. 9 :260 

Errata. 10 :202 

A Generalization of the Law of Large 
Numbers. 11 :393 











——— A Note on the Probability of Arbi- 
trary Events. 13 :238 

——— Observations on Analysis of Vari- 
ance Theory. 13 :350 

——— On the Probability Theory of Link- 
age in Mendelian Heredity. 15 :25 

——— Further Remarks on Linkage The- 
ory in Mendelian Heredity. 16 :390 

On the Definition of Distance in the 
Theory of the Gene. 16 :393 

Girshick, M. A. On the Sampling Theory of 
the Roots of Determinantal Equations. 
10 :203 

——— Note on the Distribution of Roots 
of a Polynomial with Random Complex 
Coefficients. 13 :235 

A Correction. 13 :337 

See Anderson, T. W. 15 :345 

——— and Mosteller, F., and Savage, L. J. 
Unbiased Estimates for Certain Bino- 
mial Sampling Problems with Applica- 
tions. 17:13 

Contributions to the Theory of 

Sequential Analysis I. 17:123 

Contributions to the Theory of 
Sequential Analysis II, III. 17:282 

——— See Blackwell, D. 17:310 

See Blackwell, D. 18 :277 

Godwin, H. J. A Note on Kac’s Derivation 
of the Distribution of the Mean Devi- 
ation. 20 :127 

Some Low Moments of Order Sta- 
tistics. 20 :279 

Goldberg, H., and Levine, H. Approximate 
Formulas for the Percentage Points and 
Normalization of ¢ and x?. 17 :216 

Goodman, Leo A. On the Estimation of the 
Number of Classes in a Population. 
20 :572 

Gordon, R. D. The Estimation of a Quotient 
When the Denominator is Normally 
Distributed. 12:115 

Values of Mills’ Ratio of Area to 
Bounding Ordinate and of the Normal 
Probability Integral for Large Values 
of the Argument. 12 :364 

Greenleaf, H. E. H. Curve Approximation 
by Means of Functions Analogous to the 
Hermite Polynomials. 3 :204 

Greenwood, J. A. Variance of a General 
Matching Problem. 9:56 

and Greville, T. N. E. On the Prob- 

ability of Attaining a Given Standard 
































INDEX OF PAPERS 





7 





Derivation Ratio in an Infinite Series 
of Trials. 10 :297 

Greenwood, R. E. Numerical Integration 
for Linear Sums of Exponential Func- 
tions. 20 :608 

Greville, T. N. E. See Greenwood, J. A. 
10 :297 

The Frequency Distribution of a 

General Matching Problem. 12 :350 

A Generalization of Waring’s For- 

mula. 15 :218 

On Multiple Matching with One 

Variable Deck. 15 :432 

Remark on the Note ‘‘A Generaliza- 
tion of Waring’s Formula’’. 18 :605 

Griffin, H. D. Fundamental Formulas for 
the Doolittle Method, Using Zero-order 
Correlation Coefficients. 2 :150 

Grubbs, F. E. On the Distribution of the 
Radial Standard Deviation. 15:75 

See Ferris, C. D., 17 :178 

——— See Morse, A. P. 18:194 

On Designing Single Sampling In- 
spection Plans. 20 :242 

Grummann, H. R. Some Notes on Exponen- 
tial Analysis. 7 :133 

Gruzewska, H.-M. The Precision of the 
Weighted Average. 4:196 

Gumbel, E. J. The Return Period of Flood 
Flows. 12 :163 

On Serial Numbers. 14 :163 

On the Reliability of the Classical 

Chi-Square Test. 14 :253 

Ranges and Midranges. 15 :414 

On the Independence of the Ex- 

tremes in a Sample. 17:78 

- The Distribution of the Range. 
18 :384 

Gurland, J. Inversion Formulae for the 
Distribution of Ratios. 19 :228 

Guttman, L. A Note on the Derivation of 
Formulae for Multiple and Partial Cor- 
relation. 9:305 

An Approach for Quantifying Paired 

Comparisons and Rank Order. 17 :144 

Enlargement Methods for Comput- 

ing the Inverse Matrix. 17 :336 

An Inequality for Kurtosis. 19 :277 

A Distribution-Free Confidence In- 












































terval for the Mean. 19 :410 
Halmos, P. R. Random Alms. 165 :182 
The Theory of Unbiased Estimation. 
17 :34 












8 INDEX OF PAPERS 


Halmos, P. R., and Savage, L. J. Application 
of the Radon-Nikodym Theorem to the 
Theory of Sufficient Statistics. 20:225 

Hansen, M. H., and Hurwitz, W. N. On 
the Theory of Sampling from Finite 
Populations. 14 :333 

On the Determination of 
Optimum Probabilities in Sampling. 

20 :426 

Harris, T. E. Note on Differentiation under 
the Expectation Sign in the Funda- 
mental Identity of Sequential Analysis. 
18 :294 

Branching Processes. 19 :474 

Harshbarger, B. On the Analysis of a Cer- 
tain Six-by-Six Four-Group Lattice 
Design. 15 :307 

—— — On the Analysis of a Certain Six-by- 
Six Four-group Lattice Design Using 
the Recovery of Inter-block Informa- 
tion. 16 :387 

Hart, B. I. See von Neumann, J. 12:153 

and von Neumann, J. Tabulation of 

The Probabilities for the Ratio of the 

Mean Square Successive Difference to 

the Variance. 13:207 

Significance Levels for the Ratio 
of the Mean Square Successive Differ- 
ence to the Variance. 13 :445 

Hartman, P., and Wintner, A. On the Effect 
of Decimal Corrections on Errors of 
Observation. 19 :389 

Hasel, A. A. Estimation of Volume in Timber 
Stands by Strip Sampling. 13 :179 

Hastings, C., Jr., Mosteller, F., Tukey, J. 
W., and Winsor, C. P. Low Moments 
for Small Samples: A Comparative 
Study of Order Statistics. 18 :413 

Hatke, M. A. A Certain Cumulative 
Probability Function. 20 :461 

Henderson, R. A Postulate for Observa- 
tions. 3 :32 

Hendricks, W. A. The Use of the Relative 
Residual in the Application of the 
Method of Least Squares. 2 :458 

Relative Residuals Considered As 

Weighted Simple Residuals in the Ap- 

plication of the Method of Least 

Squares. 3 :157 

The Standard Error of Any Analytic 

Function of a Set of Parameters Evalu- 

ated by the Method of Least Squares. 

5 :107 


























Problem Involving the Lexis Theory 

of Dispersion. 6 :78 

Analysis of Variance Considered As 

an Application of Simple Error Theory. 

6 :117 

and Robey, K. W. The Sampling 

Distribution of the Coefficient of Vari- 

ation. 7 :129 

An Approximation to ‘‘Student’s” 
Distribution. 7 :210 

Herbach, L. H. Bounds for Some Functions 
Used in Sequentially Testing the Mean 
of a Poisson Distribution. 19 :400 

Hildebrandt, E. H. Systems of Polynomials 
Connected with the Charlier Expan- 
sions and the Pearson Differential and 
Difference Equations. 2 :379 

Hoeffding, W. A Class of Statistics with 
Asymptotically Normal Distributions. 
19:293 

—— A Non-Parametric Test of Inde- 
pendence. 19 :546 

Hoel, P. G. A Significance Test for Com- 
ponent Analysis. 8:149 

On the Chi-Square Distribution for 

Small Samples. 9:158 

The Errors Involved in Evaluating 

Correlation Determinants. 11 :58 

On Methods of Solving Normal 
Equations. 12: 354 

— On Indices of Dispersion. 14:155 

The Accuracy of Sampling Methods 

in Ecology. 14 :289 

Testing the Homogeneity of Poisson 

Frequencies. 16 :362 

The Efficiency of the Mean Moving 

Range. 17 :475 

Discriminating between Binomial 

Distributions. 18 :556 

On the Uniqueness of Similar Re- 

gions. 19:66 

A Non-Parametric Test of Inde- 

pendence. 19 :546 

and Peterson, R. P. A Solution to the 
Problem of Optimum Classification. 
20 3433 

Horst, P. A Short Method for Solving for a 
Coefficient of Multiple Correlation. 3 :40 

A Method for Determining the Co- 
efficients of a Characteristic Equa- 
tion. 6:83 

Horton, H. B. A Method for Obtaining 
Random Numbers. 19:81 












































t 





INDEX OF PAPERS 9 





— and Smith, R. T., II. A Direct 
Method for Producing Random Digits 
in Any Number System. 20:82 

Hotelling, H. The Generalization of Stu- 
dent’s Ratio. 2 :360 

and Solomons, L. M. The Limits of a 

Measure of Skewness. 3 :141 

and Pabst, M. R. Rank Correlation 

and Tests of Significance Involving No 

Assumption of Normality. 7:29 

and Frankel, L. R. The Transforma- 

mation of Statistics to Simplify Their 

Distribution. 9 :87 

The Selection of Variates for Use in 
Prediction with Some Comments on 
the Problem of Nuisance Parameters. 
11: 271 

—— The Teaching of Statistics. 11 :457 

Experimental Determination of the 

Maximum of a Function. 12:20 

Some New Methods in Matrix Cal- 

culation. 14:1 

Further Points on Matrix Calcula- 

tion and Simultaneous Equations. 

14 :440 

Some Improvements in Weighing 

and Other Experimental Techniques. 

15 :297 

Note on a Matric Theorem of A. T. 
Craig. 15 :427 

Housner, G. W., and Brennan, J. F. The 
Estimation of Linear Trends. 19:380 

Howell, J. M. Control Chart for Largest and 
Smallest Values. 20 :305 

Hsu, C. T. On Samples from a Normal 
Bivariate Population. 11 :410 

Samples from Two Bivariate Normal 
Populations. 12 :279 

Hsu, L. C. Some Combinatorial Formulas 
with Applications to Probable Values 
of a Polynomial-Product and to Dif- 
ferences of Zero. 15 :399 

See Chung, K. L. 16:91 

Some Combinatorial Formulas on 

Mathematical Expectation. 16 :369 

Note on an Asymptotic Expansion 
of the nth Difference of Zero. 19 :273 

Hsu, P. L. Notes on Hotelling’s Generalized 
T.9:231 

The Approximate Distributions of 

the Mean and Variance of a Sample of 

Independent Variables. 16:1 


















































On the Approximate Distribution of 

Ratios. 16:204 

On the Power Functions for the E?- 

Test and the T-Test. 16:278 

On the Asymptotic Distributions of 
Certain Statistics Used in Testing the 
Independence between Successive Ob- 
servations from a Normal Population. 
17 :350 

Huntington, E. V. Frequency Distribution 
of Product and Quotient. 10:195 

Hurwitz, H., and Kac, M. Statistical Analy- 
sis of Certain Types of Random Func- 
tions. 15 :173 

Hurwitz, W. N. See Hansen, M. H. 14:333 

See Hansen, M. H. 20 :426 

Jenkins, T. N. A Short Method and Tables 
for the Calculation of the Average and 
Standard Deviation of Logarithmic 
Distributions. 3:45 

Johnson, E., Jr. Estimates of Parameters 
by Means of Least Squares. 11 :453 

Johnson, N. L. Parabolic Test for Linkage. 
11 :227 

Jones, H. L. Exact Lower Moments of Order 
Statistics in Small Samples from a Nor- 
mal Distribution. 19 :270 

Jordan, C. Approximation and Graduation 
according to the Principle of Least 
Squares by Orthogonal Polynomials. 
3 :257 

Joseph, J. A. On the Coefficients of the Ex- 
pansion of X™, 10 :293 

Kac, M. See Hurwitz, H. 15: 173 

Random Walk in the Presence of Ab- 

sorbing Barriers. 16 :62 

A Remark on Independence of 

Linear and Quadratic Forms Involving 

Independent Gaussian Variables. 16: 

400 

















and Siegert, A. J. F. An Explicit 

Representation of a Stationary Gaus- 

sian Process. 18 :438 

On the Characteristic Functions of 
the Distributions of Estimates of 
Various Deviations in Samples from a 
Normal Population. 19 :257 

Kaplansky, I. A Characterization of the 
Normal Distribution. 14:197 

The Asymptotic Distribution of 

Runs of Consecutive Elements. 16 :200 











10 INDEX OF PAPERS 


Kaplansky, I., and Riordan, J. Multiple 
Matching and Runs by the Symbolic 
Method. 16 :272 

Kavanagh, A. J. Note on the Adjustment of 
Observations. 12:111 

Kempthorne, O. The Factorial Approach to 
the Weighing Problem. 19:238 

Kendall, D. G. On the Generalized “‘Birth- 
and-Death’”’ Process. 19:1 

Kendall, M. G., and Smith, B. B. The 
Problem of m Rankings. 10 :275 

Conditions for Uniqueness in the 
Problem of Moments. 11 :402 

Corrections to a Paper on the 
Uniqueness Problem of Moments. 12 :464 

Kenney, J. F. The Regression Systems of 
Two Sums Having Random Elements 
in Common. 10:70 

Kent, R. H. See von Neumann, J. 12 :153 

Keyfitz, N. Graduation by a Truncated 
Normal. 9:66 

Kimball, B. F. Orthogonal Polynomials 
Applied to Least Square Fitting of 
Weighted Observations. 11 :348 

Limited Type of Primary Proba- 
bility Distribution Applied to Annual 
Flood Flows. 13 :318 

Note on Asymptotic Value of Proba- 
bility Distribution of Sum of Random 
Variables Which Are Greater Than a 
Set of Arbitrarily Chosen Numbers. 
15 :423 

Sufficient Statistical Estimation 
Functions for the Parameters of the 
Distribution of Maximum Values. 
17 :299 

Some Basic Theorems for Develop- 
ing Tests of Fit for the Case of the Non- 
Parametric Probability Distribution 
Function, I. 18:540 

An Approximation to the Sampling 
Variance of an Estimated Maximum 
Value of Given Frequency Based on 
Fit of Doubly Exponential Distribution 
of Maximum Values. 20:110 

Kincaid, W. M. Note on the Error in Inter- 
polation of a Function of Two Inde- 
pendent Variables. 19:85 

Solution of Equations by Interpola- 
tion. 19 :207 

King, W. I. The Annals of Mathematical 
Statistics. 1:1 

Kirkham, W. J. Moments about the Arith- 


metic Mean of a Binomial Frequency 
Distribution. 6 :96 
Note on the Derivation of the Mul- 
tiple Correlation Coefficient. 8:68 
Kishen, N. On the Design of Experiments 
for Weighing and Making Other Types 
of Measurements. 16 :294 
Kolmogorov, A. Confidence Limits for an 
Unknown Distribution Function. 12 :461 
Koopmans, T. Serial Correlation and Quad- 
ratic Forms in Normal Variables. 13 :14 
Kossack, C. F. On the Mechanics of Classi- 
fication. 16 :95 
Kozakiewicz, W. On the Convergence of 
Sequences of Moment Generating Func- 
tions. 18 :61 
Kullback, S. An Application of Charac- 
teristic Functions to the Distribution 
Problem of Statistics. 5 :263 
A Note on the Analysis of Variance. 
6 :76 
—— A Note on Sheppard’s Corrections. 
6 :158 
On Samples from a Multivariate 
Normal Population. 6:202 
The Distribution Laws of the Differ- 
ence and Quotient of Variables Inde- 
pendently Distributed in Pearson Type 
III Laws. 7:51 
On Certain Distributions Derived 
from the Multinomial Distribution. 
8 3127 
Note on a Matching Problem. 10:77 
A Note on Neyman’s Theory of 
Statistical Estimation. 10:388 
and Frankel, A. A Simple Sampling 
Experiment on Confidence Intervals. 
11:209 
On the Charlier Type B Series. 
18 :574 
Correction to ‘‘On the Charlier Type 
B Series’’. 19 :427 
Laderman, J. See Lowan, A. N. 10:360 
“Student’s”? Ratio for Samples of 
Two Items Drawn from Non-Normal 
Universes. 10 :376 
Landau, H. G. Note on the Variance and 
Best Estimates. 16 :219 
Larguier, E. H. On Certain Coefficients 
Used in Mathematical Statistics. 6 :220 
On a Method for Evaluating the 
Moments of a Bernoulli Distribution. 
7:191 





INDEX OF PAPERS 11 


Larson, H. D. On the Moments about the 
Arithmetic Mean of a Hypergeometric 
Frequency Distribution Function. 10: 
198 

Lawther, H. P., Jr. The Extended Proba- 
ability Theory for the Continuous 
Variable with Particular Application 
to the Linear Distribution. 4 :241 

Lehmann, E. L. On Families of Admissible 
Tests. 18:97 

On Optimum Tests of Composite 
Hypotheses with One Constraint. 18 :473 
and Stein, C. Most Powerful Tests of 
Composite Hypotheses. I. Normal Dis- 
tributions. 19:495 
On the Theory of Some Non- 
Parametric Hypotheses. 20:28 

Lehmer, E. Inverse Tables of Probabilities 
of Errors of the Second Kind. 15 :388 

Leipnik, R. B. Distribution of the Serial 
Correlation Coefficient in a Circularly 
Correlated Universe. 18:80 

Lengyel, B. A. On Testing the Hypotheses 
that Two Samples Have Been Drawn 
from a Common Normal Population. 
10 :365 

Lev, J. The Point Biserial Coefficient of 
Correlation. 20 :125 

Levene, H., and Wolfowitz, J. The Covari- 
ance Matrix of Runs Up and Down. 
15 :58 

On a Matching Problem Arising in 
Genetics. 20:91 

Levine, H. See Goldberg, H. 17 :216 

Lewis, W. T. A Reconsideration of Shep- 
pard’s Corrections. 6:11 

Littauer, S. B. See Peach, P. 17:81 

Lonseth, A. T. Systems of Linear Equations 
with Coefficients Subject to Error. 
13:332 

On Relative Errors in Systems of 
Linear Equations. 16 :323 

Lotka, A. J. A Contribution to the Theory 
of Self-Renewing Aggregates, with 
Special Reference to Industrial Re- 
placement. 10:1 

On an Integral Equation in Popula- 
tion Analysis. 10:144 
The Progeny of an Entire Popula- 
tion. 13 :115 
Application of Recurrent Series in 
Renewal Theory. 19:190 
Lowan, A. N., and Laderman, J. On the 


Distribution of Errors in nth Tabular 
Differences. 10 :360 
Lukacs, E. A Characterization of the Nor- 
mal Distribution. 13 :91 
Lukomski, J. On Some Properties of Multi- 
dimensional Distributions. 10 :236 
Maceda, E. C. On the Compound and 
Generalized Poisson Distributions. 
19 :414 
MacPhail, M. S. See Dwyer, P. S. 19:517 
Madow, W. G. Contributions to the Theory 
of Co:aparative Statistical Analysis, I. 
Fundamental Theorems of Compara- 
tive Analysis. 8:159 
The Distribution of Quadratic 
Forms in Non-Central Normal Random 
Variables. 11:100 
Limiting Distributions of Quad- 
ratic and Bilinear Forms. 11 :125 
and Madow, L. On the Theory of 
Systematic Sampling, I. 15:1 
Note on the Distribution of the 
Serial Correlation Coefficient. 16 :308 
On a Source of Downward Bias in 
the Analysis of Variance and Covari- 
ance. 19 :351 
On the Limiting Distributions of 
Estimates Based on Samples from 
Finite Universes. 19 :535 
On the Theory of Systematic Sam- 
pling, II. 20:333 
Malmquist, S. A Statistical Problem Con- 
nected with the Counting of Radioac- 
tive Particles. 18:255 
Mann, H. B., and Wald, A. On the Choice 
of the Number of Intervals in the Ap- 
plication of the Chi Square Test. 18 :306 
The Construction of Orthogonal 
Latin Squares. 13 :418 
—— and Wald, A. On Stochastic Limit and 
Order Relationships. 14:217 
On the Construction of Sets of 
Orthogonal Latin Squares. 14 :401 
On a Problem of Estimation Occur- 
ring in Public Opinion Polls. 16:85 
On a Test for Randomness Based on 
Signs of Differences. 16 :193 
Note on a Paper by C. W. Cotter- 
man and L. H. Snyder. 16 :311 
Correction to the Paper “On a 
Problem of Estimation Occurring in 
Public Opinion Polls’’. 17 :87 





12 INDEX OF PAPERS 


Mann, H. B., and Whitney, D. R. On a Test 
of Whether One of Two Random Vari- 
ables Is Stochastically Larger Than the 
Other. 18:50 

Marks, E. S. A Lower Bound for the Ex- 
pected Travel among m Random Points. 
19:419 

Matern, B. Independence of Non-Negative 
Quadratic Forms in Normally Cor- 
related Variables. 20:119 

Mathisen, H. C. A Method of Testing the 
Hypothesis That Two Samples Are 
from the Same Population. 14:188 

Mauchly, J. W. Significance Test for Spheric- 
ity of a Normal n-Variate Distribu- 
tion. 11 :204 

McCarthy, M. D. On the Application of the 
z-Test to Randomized Blocks. 10 :337 

McCarthy, P. J. Approximate Solutions for 
Means and Variances in a Certain 
Class of Box Problems. 18 :349 

McMillan, B. Spread of Minima in Large 
Samples. 20:444 

McPherson, J. C. On Mechanical Tabula- 
tion of Polynomials. 12 :317 

Merrell, M. On Certain Relationships be- 
tween @; and 62 for the Point Binomial. 
4:216 

Miner, J. R. The Standard Error of a Mul- 
tiple Regression Equation. 2 :320 

Molina, E. C. Bayes’ Theorem. 2 :23 

Some Fundamental Curves for the 
Solution of Sampling Problems. 17 :325 

Montroll, E. W. On the Theory of Markoff 
Chains. 18:18 

Mood, A. M. On the L; Test for Many 
Samples. 10:187 

The Distribution Theory of Runs. 
11 :367 

On the Joint Distribution of the 
Medians in Samples from a Multivariate 
Population. 12 :268 

On the Dependence of Sampling 
Inspection Plans upon Population Dis- 
tributions. 14:415 

On Hotelling’s Weighing Problem. 
17 :432 

Tests of Independence in Contin- 
gency Tables As Unconditional Tests. 
20 :114 

Moritz, R. E. A New Theory of Deprecia- 
tion of Physical Assets. 3 :108 

Morse, A. P. and Grubbs, F. E. The Estima- 


tion of Dispersion from Differences. 
18 :194 
Mosteller, F. Note on an Application of 
Runs to Quality Control Charts. 12:228 
See Girshick, M. A. 17:13 
On Some Useful “‘Inefficient’’ Sta- 
tistics. 17 :377 
See Hastings, C. Jr. 18:413 
A k-Sample Slippage Test for an 
Extreme Population. 19:58 
Mouzon, E. D., Jr. Equimodal Frequency 
Distributions. 1 :137 
Murphy, R. B. Non-Parametric Tolerance 
Limits. 19 :581 
Myers, R. J. Note on Koshal’s Method of 
Improving the Parameters of Curves 
by the Use of the Method of Maximum 
Liklihood. 5 :320 
Nanda, D. N. Distribution of a Root of a 
Determinantal Equation. 19:47 
Limiting Distribution of a Root of a 
Determinantal Equation. 19:340 
Neyman, J. On the Problem of Confidence 
Intervals. 6:111 
Tests of Statistical Hypotheses 
Which Are Unbiased in the Limit. 9:69 
On a New Class of ‘‘Contagious” 
Distributions Applicable in Entomology 
and Bacteriolog.7. 10 :35 
On a Statistical Problem Arising in 
Routine Analyses and in Sampling 
Inspections of Mass Production. 12 :46 
See Bronowski, J. 16:330 
Noether, G. E. On Confidence Limits for 
Quantiles. 19 :416 
On a Theorem by Wald and Wolfo- 
witz. 20 :455 
Norris, N. Inequalities among Averages 
6 :72 
Convexity Properties of Generalized 
Mean Value Functions. 8:118 
Some Efficient Measures of Relative 
Dispersion. 9:214 
The Standard Errors of the Geo- 
metric Means and Their Application to 
Index Numbers. 11 :445 
Oberg, E. N. Approximate Formulas for the 
Radii of Circles Which Include a Speci- 
fied Fraction of a Normal Bivariate 
Distribution. 18 :442 
Olds, E. G. Distributions of Sums of Squares 
of Rank Differences for Small Numbers 
of Individuals. 9 :133 





INDEX OF PAPERS’ 13 


On a Method of Sampling. 11 :355 
The 5% Significance Levels for Sums of 
Squares of Rank Differences and a Cor- 
rection. 20 :117 
Olmstead, P. S. Note on Theoretical and 
Observed Distributions of Repetitive 
Occurrences. 11 :363 
Distribution of Sample Arrange- 
ments for Runs Up and Down. 17 :24 
and Tukey, J. W. A Corner Test for 
Association. 18 :495 
Olshen, C. A. Transformations of the Pear- 
son Type III Distribution. 9:176 
O’Toole, A. L. On Symmetric Functions 
and Symmetric Functions of Symmetric 
Functions. 2 :102 
On Symmetric Functions of More 
Than One Variable and of Frequency 
Functions. 3 :56 
On the System of Curves for Which 
the Method of Moments Is the Best 
Method of Fitting. 4:1 
A Method of Determining the Con- 
stants in the Bimodal Fourth Degree 
Exponential Function. 4:79 
On the Degree of Approximation of 
Certain Quadature Formulas. 4:143 
On a Best Value of R in Sample of 
R from a Finite Population of NV. 5 :146 
Otter, R. The Multiplicative Process. 20: 
206 
Pabst, M. R. See Hotelling, H. 7 :29 
Palmer, E. Z. Error and Unreliability in 
Seasonals. 1 :345 
Paulson, E. On Certain Likelihood-Ratio 
Tests Associated with the Exponential 
Distribution. 12 :301 
An Approximate Normalization of 
the Analysis of Variance Distribution. 
13 :233 
A Note on the Estimation of Some 
Mean Values for a Bivariate Distribu- 
tion. 13 :440 
A Note on Tolerance Limits. 14:90 
A Note on the Efficiency of the Wald 
Sequential Test. 18 :447 
A Multiple Decision Procedure for 
Certain Problems in the Analysis of 
Variance. 20:95 
Peach, P. and Littauer, S. B. A Note on 
Sampling Inspection. 17 :81 


Peiser, A. M. Asymptotic Formulas for 
Significance Levels of Certain Dis- 
tributions. 14 :56 

Correction to ‘‘Asymptotic Formu- 
las for Significance Levels of Certain 
Distributions’’. 20 :128 

Peterson, R. P. See Hoel, P. G. 20 :433 

Pierce, J. A. A Study of a Universe of n 
Finite Populations with Application 
to Moment-Function Adjustments for 
Grouped Data. 11 :311 

Pinney, E. Fitting Curves with Zero or 
Infinite End Points. 18 :127 

Pitman, E. J. G., and Robbins, H. Applica- 
tion of the Method of Mixtures to 
Quadratic Forms in Normal Variates. 
20 :552 

Plackett, R. L. Boundaries of Minimum Size 
in Binomial Sampling. 19 :575 

Pollard, H. S. On the Relative Stability of 
the Median and Arithmetic Mean with 
Particular Reference to Certain Fre- 
quency Distributions Which Can Be 
Dissected into Normal Distributions. 
5 :227 

Powell, R. W. Successive Integration As a 
Method for Finding Long Period Cycles. 
1:123 

Proehl, E. A. See Fertig, J. W. 8:193 

Quenouille, M. H. Problems in Plane 
Sampling. 20 :355 

The Joint Distribution of Serial 
Correlation Coefficients. 20 :561 

Quensel, C.-E. A Method of Determining 
the Regression Curve When the Mar- 
ginal Distribution Is of the Normal 
Logarithmic Type. 7 :196 

Rasch, G. A Functional Equation for 
Wishart’s Distribution. 19:262 

Raymond, J. See Birnbaum, Z. W. 18:70 

Reich, E. On the Convergence of the Classi- 
cal Iterative Method of Solving Linear 
Simultaneous Equations. 20 :448 

Reirsgl, O. A Method for Recurrent Compu- 
tation of All of the Principal Minors of 
a Determinant and Its Application in 
Confluence Analysis. 11 :193 

Richards, P. I. Probability of Co-incidence 
for Two Periodically Recurring Events. 
19 :16 

Rider, P. R. On Small Samples from Certain 
Non-Normal Universes. 2 :48 








14 INDEX OF PAPERS 


Rietz, H. L. On Certain Properties of Fre- 
quency Distributions Obtained by a 
Linear Fractional Transformation of 
the Variates of a Given Distribution. 
2 :38 

On the Frequency Distribution of 

Certain Ratios. 7 :145 

On the Distribution of the ‘‘Stu- 
dent’’ Ratio for Small Samples from 
Certain Non-Normal Populations. 
10 :265 

Riordan, J. Moment Recurrence Relations 
for Binomial, Poisson and Hypergeo- 
metric Frequency Distributions. 8 :103 

See Kaplansky, I. 16 :272 

Inversion Formulas in Normal Vari- 
able Mapping. 20:417 

Robb, R. A. Modifications of the Link Rela- 
tive and Interpolation Methods of 
Determining Seasonal Variation. 1 :352 

Robbins, H. E. On the Measure of a Ran- 
dom Set. 15:70 

On Distribution-free Tolerance Lim- 

its in Random Sampling. 15: 214 

On the Expected Values of Two 

Statistics. 15 :321 

On the Measure of a Random Set, 

IT. 16 :342 

Acknowledgement of Priority. 18:297 

Convergence of Distributions. 19:72 

The Distribution of a Definite 

Quadratic Form. 19 :266 

Mixture of Distributions. 19 :360 

—— The Distribution of Student’s ¢ 
When the Population Means Are Un- 
equal. 19 :406 

See Pitman, E. J. G. 20:522 

Roberts, J. L. Some Practical Interpola- 
tion Formulas. 6 :133 

The Elimination of 

Calendars. 7 :44 

Applications of Two Osculatory 

Formulas. 8:1 

A Coefficient of Correlation between 
Scholarship and Salaries. 8:66 

Robey, K. W. See Hendricks, W. A. 

Robinson, S. An Experiment Regarding the 
x? Test. 4:285 

Rodrigues, M. D. On an Extension of the 
Concept of Moment with Applications 
to Measures of Variability, General 
Similarity, and Overlapping. 16:74 









































Perpetual 








Royer, E. B. A Simple Method for Calculat- 
ing Mean Square Contingency. 4:75 
Rubin, H. On the Distribution of the Serial 
Correlation Coefficient. 16 :211 

See Anderson, T. W. 20:46 

Salvosa, L. R. Tables of Pearson’s Type III 
Function. 1 :191 

Derivatives of the Pearson Type III 

Curve. 1:274 

Samuelson, P. A. Conditions That the Roots 
of a Polynomial Be Less Than Unity 
in Absolute Value. 12 :360 

A Method of Determining Ex- 

plicitly the Coefficients of the Charac- 

teristic Equation. 13 :424 

Fitting General 
Series. 14:179 

Sanford, V. See Walker, H. M. 5:1 

Santalo, L. A. On the First Two Moments of 
the Measure of a Random Set. 18 :37 

Sard, A. Smoothest Approximation Formu- 
las. 20 :612 

Satterthwaite, F. E. A Concise Analysis of 
Certain Algebraic Forms. 12:77 

A Generalized Analysis of Vari- 

ance. 13 :34 

Linear Restrictions on Chi-Square. 

13 :326 

Generalized Poisson Distribution. 

13 :410 

Error Control in Matrix Calculation. 
15 :373 

Savage, L. J. See Girshick, M. A. 17:13 . 

A Uniqueness Theorem for Unbiased 

Sequential Binomial Estimation. 18 :295 

See Halmos, P. R. 20 :225 

Scheffé, H. On the Theory of Testing Com- 
posite Hypotheses with One Constraint. 
13 :280 

On the Ratio of the Variances of Two 

Normal Populations. 13 :371 

On Solutions of the Behrens-Fisher 

Problem Based on the ¢-Distribution. 

14:35 

On a Measure Problem Arising in 

the Theory of Non-Parametric Tests. 

14 :227 

Statistical Inference in the Non- 

Parametric Case. 14:305 

and Tukey, J. W. A Formula for 

Sample Sizes for Population Tolerance 

Limits. 15 :217 














Gram-Charlier 






































INDEX OF PAPERS 15 





A Note on the Behrens-Fisher Prob- 

lem. 15 :430 

and Tukey, J. W. Non-Parametric 

Estimation, I. Validation of Order 

Statistics. 16 :187 

A Useful Convergence Theorem for 
Probability Distributions, 18 :434 

Schmidt, R. Statistical Analysis of One- 
Dimensional Distributions. 5:30 

Seth, G. R. On the Variance of Estimates. 
20 :1 

Shen, C.-L. Fundamentals of the Theory 
of Inverse Sampling. Part I. Introduc- 
tion. 7 :62 

Shohat, J. Stieltjes Integrals in Mathcmati- 
cal Statistics. 1:73 

Shook, B. L. Synopsis of Elementary Math- 
ematical Statistics. 1:14 

A Synopsis of Elementary Mathe- 
matical Statistics. 1 :224 

Siegert, A. J. F. See Kac, M. 18 :438 

Silber, J. Multiple Sampling for Variables. 
19 :246 

Simon, H. A. Symmetric Tests of the 
Hypothesis That the Mean of One 
Normal Population Exceeds That of 
Another. 14:149 

Singleton, R. R. A Method for Minimizing 
the Sum of Absolute Values of Devia- 
tions. 11 :301 

Smirnov, N. Table for Estimating the Good- 
ness of Fit of Empirical Distributions. 
19 :279 

Smith, B. B. See Kendall, M. G. 10:275 

Smith, C. D. On Tchebycheff Approxima- 
tion for Decreasing Functions. 10 :190 

Smith, J. H. Estimation of Linear Func- 
tions of Cell Proportions. 18 :231 

Smith, R. T., III. See Horton, H. B. 20:82 

Sobel, M., and Wald, A. A Sequential De- 
cision Procedure for Choosing One of 
Three Hypotheses Concerning the Un- 
known Mean of a Normal Distribution. 
20 :502 

Solomons, L. M. See Hotelling, H. 3:141 

Starkey, D. M. A Test of the Significance of 
the Difference between Means of Sam- 
ples from Two Normal Populations 
without Assuming Equal Variances. 
9:201 

——— The Distribution of the Multiple 














Correlation Coefficient in Periodogram 
Analysis. 10 :327 

Stein, C. A Two-Sample Test for a Linear 
Hypothesis Whose Power Is Independ- 
ent of the Variance. 16 :243 

A Note on Cumulative Sums. 17 :498 

and Wald. A. Sequential Confidence 

Intervals for the Mean of a Normal 

Distribution with Known Variance. 

18 :427 

See Lehmann, E. L. 19:495 

See Lehmann, E. L. 20:28 

Stephan, F. F. See Deming, W. E. 11 :427 

An Iterative Method of Adjusting 

Sample Frequency Tables When Ex- 

pected Marginal Tables Are Known. 

13 :166 

The Expected Value and Variance 
of the Reciprocal and Other Negative 
Powers of a Positive Bernoullian Vari- 
ate. 16 :50 

Stewart, W. M. A Note on the Power of the 
Sign Test. 12 :236 

Stock, J. S., and Frankel, L. R. The Alloca- 
tion of Samplings among Several Strata. 
10 :288 

Strecker, G. On Evaluating a Coefficient 
of Partial Correlation. 6 :143 

Swed, F. S., and Eisenhart, C. Tables for 
Testing Randomness of Grouping in a 
Sequence of Alternatives. 14:66 

Thiele, T. N. The Theory of Observations. 
2:165 

Thompson, W. R. On a Criterion for the 
Rejection of Observations and the Dis- 
tribution of the Ratio of Deviation to 
Sample Standard Deviation. 6:214 

On Confidence Ranges for the Me- 

dian and Other Expectation Distribu- 

tions for Populations of Unknown Dis- 

tribution Form. 7 :122 

Biological Applications of Normal 
Range and Associated Significance 
Tests in Ignorance of Original Distribu- 
tion Forms. 9:281 

Tintner, G. On Tests of Significance in 
Time Series. 10 :139 

A Note on Rank, Multicollinearity 
and Multiple Regression. 16 :304 

Toops, H. A. On the Systematic Fitting of 
Straight Line Trends by Stencil and 
Calculating Machine. 5:21 
































16 INDEX OF PAPERS 


Treloar, A. E., and Wilder, M. A. The 
Adequacy of ‘‘Student’s’”’ Criterion of 
Deviations in Small Sample Means. 
5 :324 

Truesdell, C. A Note on the Poisson- 
Charlier Functions. 18 :450 

Tuckerman, L. B. On the Mathematically 
Significant Figures in the Solution of 
Simultaneous Linear Equations. 12 :307 

Tukey, J. W. See Scheffé, H. 15 :217 

See Scheffé, H. 16:187 

See Brown, G. W. 17:1 

An Inequality for Deviations from 

Medians. 17 :75 

and Wilks, S. S. Approximation of 

the Distribution of the Product of 

Beta Variables by a Single Beta Vari- 

able. 17 :318 

See Hastings, C., Jr. 18:413 

—— See Olmstead, P. S. 18:495 

Non-Parametric Estimation, IT. Sta- 
tistically Equivalent Blocks and Tol- 
erance Regions—The Continuous Case. 
18 :529 

— Non-Parametric Estimation, III. Sta- 
tistically Equivalent Blocks and Multi- 
variate Tolerance Regions—The Dis- 
continuous Case. 19:30 

Approximate Weights. 19:91 

Sufficiency, Truncation and Selec- 

tion. 20 :309 

Moments of Random Group Size 
Distributions. 20 :523 

Ullman, J. The Probability of Convergence 
of an Iterative Process of Inverting a 
Matrix. 15 :205 

Vajda, S. On the Constituent Items of the 
Reduction and the Remainder in the 
Method of Least Squares. 16 :381 

Vatnsdal, J. R. Minimal Variance and Its 
Relation to Efficient Moment Tests. 
17 :198 

Villars, D. S. and Anderson, T. W. Some 
Significance Tests for Normal Bivariate 
Distributions. 14:141 

A Significance Test and Estimation 
in the Case of Exponential Regression. 
18 :596 

von Mises, R. A Modification of Bayes’ 
Problem. 9 :256 

The Limits of a Distribution Func- 

tion If Two Expected Values Are 

Given. 10 :99 






































On the Foundations of Probability 

and Statistics. 12:191 

and Doob, J. L. Discussion of Papers 

on Probability Theory. 12:215 

On the Correct Use of Bayes’ For- 

mula. 13 :156 

On the Problem of Testing Hy- 

potheses. 14 :238 

On the Classification of Observa- 

tion Data into Distinct Groups. 16:68 

On the Asymptotic Distribution of 
Differentiable Statistical Functions. 
18 :309 

von Neumann, J., Bellinson, H. R., Kent, 
R. H., Hart, B. I. The Mean Square 
Successive Difference. 12 :153 

Distribution of the Ratio of the 

Mean Successive Difference to the 

Variance. 12 :367 

A Further Remark Concerning the 

Distribution of the Ratio of the Mean 

Square Successive Difference to the 

Variance. 13 :86 

See Hart, B. I. 13 :207 

von Schelling, H. A Formula for the Partial 
Sums of Some Hypergeometric Series. 
20 :120 

Votaw, D. F., Jr. The Probability Distribu- 
tion of the Measure of a Random Linear 
Set. 17 :240 

Testing Compound Symmetry in a 
Normal Multivariate Distribution. 
19 :447 

Wald, A. A Generalization of Markoff’s In- 
equality. 9:244 

and Wolfowitz, J. Confidence Limits 

for Continuous Distribution Functions. 

10 :105 

Contributions to the Theory of 

Statistical Estimation and Testing 

Hypotheses. 10 :299 

A Note on the Analysis of Variance 

with Unequal Class Frequencies. 11 :96 

and Wolfowitz J. On a Test Whether 

Two Samples Are from the Same 

Population, 11 :147 

The Fitting of Straight Lines If 

Both Variables Are Subject to Error. 

11 :284 

Asymptotically Most Powerful Tests 

of Statistical Hypotheses. 12:1 

and Wolfowitz J. Note on Confidence 


















































POETS 0 Nag Ne ee 











; INDEX OF PAPERS 17 


\ Limits for Continuous Distribution 
Functions. 12:118 
| and Brookner, R. J. On the Distribu- 
tion of Wilks’ Statistic for Testing the 
Independence of Several Groups of 
Variates. 12 :137 
On the Analysis of Variance in Case 
of Multiple Classifications with Un- 
equal Class Frequencies. 12 :346 
Some Examples of Asymptotically 
Most Powerful Tests. 12 :396 
Asymptotically Shortest Confidence 
h Intervals, 13 :127 
See Mann, H. B. 13:306 
Setting of Tolerance Limits When 
the Sample Is Large. 13 :389 
—— On the Power Function of the 
Analysis of Variance Test. 13 :434 
An Extension of Wilks’ Method of 
Setting Tolerance Limits. 14:45 
On the Efficient Design of Statistical 
Investigations. 14 :134 
See Mann, H. B. 14:217 
and Wolfowitz, J. An Exact Test for 
Randomness in the Non-Parametric 
Case Based on Serial Correlation. 14 :378 
On a Statistical Problem Arising in 
the Classification of an Individual into 
One of Two Groups. 15 :145 
On Cumulative Sums of Random 
Variables. 15 :283 
Note on a Lemma. 15 :330 
and Wolfowitz, J. Statistical Tests 
Based on Permutations of the Observa- 
tions. 15 :358 
j Sampling Inspection Plans 
: for Continuous Production Which In- 
sure a Prescribed Limit on the Out- 
going Quality. 16:30 
Sequential Tests of Statistical Hy- 
potheses. 16 :117 
Some Generalizations of the Theory 
of Cumulative Sums of Random Vari- 
ables. 16 :287 
and Wolfowitz, J. Tolerance Limits 
for a Normal Distribution. 17 :208 
Some Improvements in Setting Lim- 
its for the Expected Number of Obser- 
vations Required by a Sequential 
Probability Ratio Test. 17 :466 
——— Differentiation under the Expecta- 
tion Sign in the Fundamental Identity 
, of Sequential Analysis. 17 :493 
























































me 











See Stein, C. 18 :427 

An Essentially Complete Class of 
Admissible Decision Functions. 18 :549 

——A Note on Regression Analysis. 
18 :586 

Asymptotic Properties of the Maxi- 

mum Likelihood Estimate of an Un- 

known Parameter of a Discrete Sto- 

chastic Process. 19:40 

Estimation of a Parameter When the 

Number of Unknown Parameters In- 

creases Indefinitely with the Number 

of Observatio.us. 19:220 

and Wolfowitz, J. Optimum Char- 

acter of the Sequential Probability 

Ratio Test. 19 :326 

See Berger, A. 20 :104 

Statistical Decision Functions. 20: 

165 

See Sobel, M. 20:502 

Note on the Consistency of the Maxi- 
mum Likelihood Estimate. 20 :595 

Walker, H. M., and Sanford, V. The Ac- 
curacy of Computation with Approxi- 
mate Numbers. 5:1 

Walsh, J. E. Some Significance Tests Based 
on Order Statistics. 17:44 

Some Order Statistic Distributions 

for Samples of Size Four. 17 :246 

On the Power Function of the Sign 

Test for Slippage of Means. 17 :358 

Concerning the Effect of Intraclass 

Correlation on Certain Significance 

Tests. 18:88 

An Extension to Two Populations of 

an Analogue of Student’s t-Test Using 

the Sample Range. 18 :280 

On the Power Efficiency of a ¢-Test 

Formed by Pairing Sample Values. 

18 :601 

On the Use of the Non-Central ¢-Dis- 

tribution for Comparing Percentage 

Points of Normal Populations. 19:93 

Some Significant Tests for the Me- 

dian Which Are Valid under Very Gen- 

eral Conditions. 20 :64 

On the Range-Midrange Test and 
Some Tests with Bounded Significance 
Levels. 20 :257 

— On the Power Function of the ‘“‘Best”’ 
t-test Solution of the Behrens-Fisher 
Problem. 20 :616 





















































18 INDEX OF PAPERS 


Walsh, J. E. Concerning Compound Ran- 
domization in a Binary System. 20 :580 
Waugh, F. V. A Note Concerning Hotel- 
ling’s Method of Inverting a Parti- 
tioned Matrix. 16 :216 
and Dwyer, P. S. Compact Computa- 
tion of the Inverse of a Matrix. 16 :259 
Weaver, C. L. See Ferris, C. D. 17 :178 
Weida, F. M. On Measures of Contingency. 
5 :308 
On Certain Distribution Functions 
When the Law of the Universe Is Pois- 
son’s First Law of Error. 6 :102 
Welch, B. L. On Confidence Limits and 
Sufficiency with Particular Reference 
to Parameters of Location. 10:58 
On the Studentization of Several 
Variances. 18 :118 
Welker, E. L. The Distribution of the Mean. 
18:111 
Wertheimer, A. A Generalized Error Func- 
tion. 3 :64 
Note on Zoch’s Paper on the Postu- 
late of the Arithmetic Mean. 8:112 
A Note on Confidence Intervals and 
Inverse Probability. 10:74 
Wherry, R. J. A New Formula for Pre- 
dicting the Shrinkage of the Coefficient 
of Multiple Correlation. 2 :440 
The Shrinkage of the Brown-Spear- 
man Prophecy Formula. 6:183 
Whitney, D. R. See Mann, H. B. 18: 50 
Wicksell, S. D. Remarks on Regression. 1 :3 
Wilder, M. A. See Treloar, A. E. 5 :324 
Wilkins, J. E., Jr. A Note on Skewness and 
Kurtosis. 15 :333 
Wilks, S. S. Moments and Distributions of 
Estimates of Population Parameters 
from Fragmentary Samples. 3 :163 
On the Sampling Distribution of the 
Multiple Correlation Coefficient, 3 :196 
The Likelihood Test of Independ- 
ence in Contingency Tables. 6 :190 
The Large-Sample Distribution of 
the Likelihood Ratio for Testing Com- 
posite Hypotheses, 9:60 
Shortest Average Confidence In- 
tervals from Large Samples. 9:166 
Fiducial Distributions in Fiducial 
Inference. 9 :272 
and Daly, J. F. An Optimum Prop- 
erty of Confidence Regions Associ- 
ated with the Likelihood Function. 
10 :225 


On the Determination of Sample 
Sizes for Setting Tolerance Limits. 
12 :91 
Statistical Prediction with Special 
Reference to the Problem of Tolerance 
Limits. 13 :400 
Sample Criteria for Testing Equality 
of Means, Equality of Variances and 
Equality of Covariances in a Normal. 
Multivariate Distribution. 17 :257 
See Tukey, J. W. 17:318 
Will, H. S. On Fitting Curves to Observa- 
tional Series by the Method of Differ- 
ences. 1 :159 
——— On a General Solution for the Param- 
eters of Any Function with Applica- 
tion to the Theory of Organic Growth. 
73165 
Williams, J. D. Moments of the Ratio of 
the Mean Square Successive Difference 
to the Mean Square Difference in 
Samples from a Normal Universe. 
12 :239 
An Approximation to the Prob- 
ability Integral. 17 :363 
Winsor, C. P. See Hastings, C., Jr., 18:413 
Wintner, A. On the Shape of the Angular 
Case of Cauchy’s Distribution Curves. 
18: 589 
See Hartman, P. 19: 389 
Wisniewski, J. K. A Problem in Least 
Squares. 8: 145 
Wold, H. O. A. On Prediction in Stationary 
Time Series. 19: 558 
Wolfowitz, J. See Wald, A. 10: 105 
See Wald, A. 11 :147 
——— See Wald, A. 12:118 
——— Additive Partition Functions and 
a Class of Statistical Hypotheses. 
13 :247 
On the Theory of Runs with Some 
Applications to Quality Control. 14 :280 
See Wald, A. 14:378 
——— See Levene, H. 15:58 
— Note on Runs of Consecutive Ele- 
ments, 15 :97 
Asymptotic Distributions of Runs 
Up and Down. 15:163 
See Wald, A. 15:358 
— See Wald, A. 16:30 
——— See Wald, A. 17:208 
——— Confidence Limits for the Fraction 
of a Normal Population Which Lies be- 
tween Two Given Limits. 17 :483 





INDEX OF PAPERS 19 


On Sequential Binomial Estimation. 
17 :489 
Consistency of Sequential Binomial 
Estimates. 18 :131 
The Efficiency of Sequential Esti- 
mates and Wald’s Equation for Se- 
quential Processes. 18 :215 
See Wald, A., 19 :326 
On Wald’s Proof of the Consistency 
of the Maximum Likelihood Estimate. 
20 :601 
—— The Power of the Classical Tests 
Associated with the Normal Distribu- 
tion. 20 :540 
Wong, Y. K. An Application of Orthogon- 
alization Process to the Theory of 
Least Squares. 6 :53 
On Standard Error for the Line of 
___ Mutual Regression. 7 :47 
Woodbury, M. A. Rank Correlation When 
There Are Equal Variates. 11 :358 
On a Probability Distribution. 
20 :311 


Wright, S. The Method of Path Coefficients. 
5 :161 
Yosida, K. Brownian Motion on the Sur- 
face of the 3-Sphere. 20 :292 
Young, L. C. On Randomness in Ordered 
Sequences. 12 :293 
Yuan, P. T. On the Logarithmic Frequency 
Distribution and the Semi-logarithmic 
Correlation Surface. 4:30 
Ziaud-Din, M. On Differential Operators 
Developed by O’Toole. 9:63 
Zoch, R. T. Invariants and Covariants of 
Certain Frequency Curves. 5 :124 
Some Interesting Features of Fre- 
quency Curves. 6:1 
On the Postulate of the Arithmetic 
Mean. 6:171 
Reply to Mr. Wertheimer’s Paper. 
8 :177 
Zuckerman, H. S. See Birnbaum, Z. W. 
16 :328 
—— See Birnbaum, Z. W. 18:70 
See Birnbaum, Z. W. 20 :313 
Zygmund, A. A Remark on Characteristic 
Functions. 18 :272 





SUBJECT INDEX 


The number in bold face type following the title of the paper indicates the volume number, and the number fol- 
lowing the colon indicates the beginning page. 


Acceptance Sampling (see Sampling In- 
spection) 
Annals of Mathematical Statistics 
King, W. I. 1:1 
Arithmetic Mean (see also Sampling Theory 
of Means) 
Pollard, H. 8S. 5 :227 
Wertheimer, A. 8:112 
Zoch, R. T. 6:171 
Analysis of Covariance 
Bliss, C. I., and Cochran, W. G. 19:151 
Cochran, W. G., and Bliss, C. I. 19:151 
Analysis of Variance 
Bishop, M. C. 10 :393 
Cochran, W. G. 11 :335 
Craig, A. T. 9:48 14:195 
Curtiss, J. H. 14:107 
Geiringer, H. 13 :238 
Harshbarger, B. 15 :307 
Hendricks, W. A. 6:117 
Kullback, 8S. 6:76 
Madow, W. G. 19:351 
Matern, B. 20:119 
Paulson, E. 13 :233 20 :95 
Satterthwaite, F. E. 12:77 
Scheffé, H. 13 :371 
Vajda, S. 16 :381 
Wald, A.11:96 12:346 13 :434 
Bayes’ Theorem (see Inverse Probability) 
Behrens-Fisher Problem 
Scheffé, H.14:35 15:430 
Walsh, J. E. 20:616 
Beta-Function, Incomplete 
Aroian, L. A. 12:218 
Bancroft, T. A. 16:98 20 :451 
Binomial Distribution (see also Sampling 
Theory) 
Byron, F. H. 6:21 
Clark, C. E. 8:116 
Cochran, W. G. 11 :335 
Cramer, G. F. 19 :592 
Dorfman, R. 14 :486 
Evans, W. D. 11:106 
Feller, W. 16 :319 


16 :387 


13 334 


Girshick, M. A., Savage, L. J. and Mos- 
teller, F. 17:13 
Hoel, P. G. 18 :556 
Kirkham, W. J. 6:96 
Merrell, M. 4:216 
Mosteller, F., Savage, L. J. 
shick, M. A. 17:13 
Plackett, R. L. 19:575 
Riordan, J. 8:103 
Savage, L. J., Mosteller, F. and Girshick, 
M. A. 17:13 
Stephan, F. F. 16:50 
Wolfowitz, J. 17 :489 18 :131 
Characteristic Function (see also Moment- 
Generating Function) 
Kac, M. 19:257 
Kullback, S. 5 :263 
Zygmund, A. 18 :272 
Chi-Square Distribution (see Chi-Square 
Test) 
Chi-Square Test 
Aroian, L. A. 14:93 
Goldberg, H., and Levine, H. 17 :216 
Gumbel, E. J. 14:253 
Hoel, P. G. 9:158 
Levine, H., and Goldberg, H. 17 :216 
Mann, H. B., and Wald, A. 13:306 
Satterthwaite, F. E. 13 :326 
Wald, A., and Mann, H. B. 13 :306 
Coefficient of Stability 
Forsyth, C. H. 8:5 
Coefficient of Variation 
Hendricks, W. A., and Robey, K. W. 7 :129 
Robey, K. W., and Hendricks, W. A. 7:129 
Computation (see also Curve-Fitting, Least 
Squares, and Matrix Theory) 
Bishop, M. C. 10:393 
Camp, C. C. 8:72 
Cohen, A. C., Jr. 11:213 
Dwyer, P. S. 8:224 
12 :449 
Dwyer, P.S.,and Waugh, F. V. 16 :259 
Enlow, E. R. 5:136 
Greenwood, R. E. 20 :608 


and Gir- 


9 :288 11 :66 





INDEX OF PAPERS 


Grifin, H. D. 2:150 

Guttman, L. 17 :336 

Hoel, P. G. 11:58 

Hotelling, H. 14:1 

Jenkins, T. N. 3:45 

Lonseth, A. T. 15 :323 

McPherson, J. C. 12:317 

O’Toole, A. L. 4:143 

Reiersgl, O. 11 :193 

Sanford, V., and Walker, H. M. 6:1 

Satterthwaite, F. E. 15 :373 

Walker, H. M., and Sanford, V. 5:1 

Waugh, F. V., and Dwyer, P. S. 16 :259 
Confidence Intervals 

Carlton, A. G. 17 :355 

Daly, J. F., and Wilks, S. 8S. 10:225 

Frankel, A., and Kullback, S. 11 :209 

Kolmogoroff, A. 12 :461 

Kullback, S., and Frankel, A. 11 :209 

Neyman, J. 6:111 

Noether, Gottfried E. 19 :416 

Thompson, W. R. 7 :122 

Wald, A., and Wolfowitz, J.10:105 12: 

118 

Wald, A. 13 :127 

Welch, B. L. 10:58 

Wertheimer, A. 10:74 

Wilks, 8. 8. 9:166 

Wilks, S. S., and Daly, J. F. 10 :225 

Wolfowitz, J:,and Wald, A.10:105 12: 

118 

Wolfowitz, J. 17 :483 

Confidence Limits (see Confidence Inter- 
vals) 

Confidence Regions (see Confidence Inter- 
vals) 

Contagious Distribution 

Castagnetto, L., and Cernuschi, F. 17:53 
18 :122 
Cernuschi, F., and Castagnetto, L. 17:53 
18 :122 
Feller, W. 14:389 
Neyman, J.10:35 
Contingency and Contingency Tables 
Mood, A. M. 20:114 
Royer, E. B. 4:75 
. Weida, F. M. 5:308 
Wilks, S. 8. 6:190 

Corrections for Grouping (see Sheppard’s 
Corrections) 

Correlation (see also Rank Correlation, 
Multiple Correlation, Partial Correla- 
tion and Serial Correlation) 

Bacon, H. M. 19:422 


12 :354 
14:440 


Baker, G. A. 8:179 
Bartlett, M. 8. 18:1 
Bernstein, F. 8:77 
Coleman, J. B. 3:79 
Craig, A. T. 4:127 
DeLury, D. B. 9:149 
Dunlap, H. F. 2 :66 
Dwyer, P. 8. 15:82 
Dwyer, P.8S., and MacPhail, M. 8. 19:517 
Dwyer, P. 8. 20 :404 
Fischer, C. H. 4:103 
Griffin, H. D. 2:150 
Hoel, P. G. 11:58 
Lev, J. 20:125 
MacPhail, M.S8S., and Dwyer, P. 8. 19:517 
Olmstead, P. S., and Tukey, J. W. 18:495 
O’Toole, A. L. 5 :146 
Roberts, J. L. 8:66 
Tukey, J. W., and Olmstead, P. 8. 18:495 
Wherry, R. J. 6:183 
Curve-Fitting 
Bailey, J. L., Jr. 2:355 
Greenleaf, H. E. H. 3 :204 
O’Toole, A. L. 4:1 
Pinney, E. 18 :127 
Toops, H. A. 6:21 
Tukey, J. W., and Wilks, S. 8. 17:318 
Wilks, S. 8., and Tukey, J. W. 17:318 
Will, H.S.1:159 
Depreciation 
Moritz, R. E. 3:108 
Descriptive Statistics 
Boldyreff, J. W. 5 :216 
Bortkiewicz, L. v. 2:1 
Burgess, R. W. 3:10 
Byrne, L. 6 :146 
Carver, H. C. 6:153 
Eisenhart, C. 10:86 
Henderson, R. 3 :32 
Larguier, E. H. 6 :220 
Norris, N. 9:214 
Shook, B. L. 1:14 
Thiele, T. N. 2:165 
Design of Experiments 
Banerjee, K. 8. 19:394  20:300 
Bliss, C. I. 17 :232 
Bose, R. C. 20 :619 
Cox, G. M. 11:72 
Harshbarger, B. 15 :307 
Hotelling, H. 15 :297 
Kempthorne, O. 19 :238 
Kishen, N. 16 :294 
Mann, H. B. 13 :416 14 :401 
McCarthy, M. D. 10 :337 


13 :97 


1 :224 


16 :387 





22 INDEX OF PAPERS 


Design of Experiments—Cont. 
Mood, A. M. 17 :432 
Wald, A. 14:134 
Discriminant Functions 
Bliss, C. I. and Cochran, W. G.19:151 
Brown, G. M. 18:514 
Cochran, W. G., and Bliss, C. I. 19:151 
Kossack, C. F. 16 :95 
Wald, A.15 :145 
Dispersion (see also Sampling Theory of 
Variances) 
Hendricks, W. A. 6:78 
Hoel, P. G. 14:155 
Distribution Functions (see also Contagious 
Distributions, Binomial Distribution, 
Chi-Square Test, Exponential Distri- 
bution, Fisher’s z-Test, Hypergeometric 
Distribution, Multinomial Distribution, 
Negative Binomial Distribution, Pois- 
son Distribution, Sampling Theory, 
Snedecor’s F-Test) 
Aroian, L. A. 18:265 
Baker, G. A. 8:179 
Baten, W. D. 6:13 
Burr, I. W. 13 :215 
Camp, B. H. 8:90 
Curtiss, J. H. 12 :409 
Fischer, C. H. 4:103 
Gurland, J. 19 :228 
Hatke, M. A. 20 :461 
Huntington, E. V. 10:195 
Kimball, B. F. 15 :423 
Kullback, 8S. 7 :51 
Mouzon, E. D., Jr. 1 :137 
Schmidt, R. 6 :30 
Tukey, J. W., and Wilks, S. 8. 17:318 
Wilks, S. S. and Tukey, J. W., 17:318 
Yuan, P. T. 4:30 
Zoch, R.T.6:124 6:1 
Error Theory (see Sampling Theory) 
Estimation of Parameters (see Statistical 
Estimation) 
Exponential Distribution 
Aroian, L. A. 19 :589 
Grummann, H. R. 7 :133 
Kimball, B. F. 20:110 
Malmquist, S. 18 :255 
O’Toole, A. L. 4:79 
Paulson, E. 12 :301 
Extreme Values 
Epstein, B.20:99 20:590 
Gumbel, E. J. 12:163 16:414 
18 :354 


9:221 17 :226 


17 :78 


Kimball, B. F. 13 :318 
McMillan, B. 20 :444 
Fiducial Inference and Distribution (see 
also Confidence Intervals) 
Bartlett, M.S. 10:129 
Fisher, R. A. 10 :383 
Wilks, S. 8S. 9:272 

Fiducial Intervals (see Confidence Inter- 

vals) 

Finite Differences 
Barkey, I. H. 6:131 
Hsu, L. C. 15 :399 
Joseph, J. A. 10 :293 
Laderman, J. and Lowan, A. N. 10:360 
Lowan, A. N. and Laderman, J. 10 :360 

Fisher’s z-Test (see also Snedecor’s F- 

Test) 
Aroian, L. A. 12 :429 
Cochran, W. G. 11:93 
McCarthy, M. D. 10:337 

Frequency Curves (see Distribution Func- 

tions) 

Frequency Functions 

Functions) 

Genetics, Mathematical Theory of 
Bernstein, S. 13 :53 
Geiringer, H. 15:25 
Mann, H. B. 16:311 

Geometric Mcan 
Camp, B. H. 9:221 
Norris, N.11 :445 

Graduation (see also Curve-Fitting) 
Dodd, E. L. 10:254 12:127 
Greville, T. N. E. 16:218 18 :605 
Jordan, C. 3 :257 
Kavanagh, A. J.12:111 
Keyfitz, N. 9:66 
Roberts, J. L. 8:1 
Sard, A. 20 :612 

Gram-Charlier Series 
Aroian, L. A. 8:183 
Baker, G. A. 1:199 
Boas, R. P., Jr. 20 :376 
Hildebrandt, E. H. 2:379 
Kullback, 8.18:574 19:427 
Samuelson, P. A. 14:175 
Truesdell, C. 18 :450 

Hypergeometric Distribution 
Larson, H. D. 10:198 
Riordan, J. 85103 
von Schelling, H. 20 :120 

Index Numbers 
Carver, H. C. 3 :361 


19 :273 


(see Distribution 


16:390 16:393 


6 :127 





INDEX OF PAPERS 


Interpolation 
Kincaid, W.M.19:85  19:207 
Roberts, J. L. 6 :133 
Inverse Probability 
Berkson, J. 1:42 
Eisenhart, C. 10 :390 
Molina, E. C. 2:23 
von Mises, R. 9 :256 13 2156 
Wertheimer, A. 10:74 
Kurtosis 
Birnbaum, Z. W. 19:76 
Guttman, L. 19 :277 
Wilkins, J. E., Jr. 15 :333 
Least Squares 
Bleick, W. E. 11 :225 
Coleman, J. B. 3:79 
Davis, H. T. 4:155 
Deming, W. E., and Stephan, F. F. 11 :427 
Dwyer, P. 8S. 12 :449 15 :82 20 :404 
Griffiin, H. D. 2:150 
Hendricks, W. A. 2:458 $:157 6:107 
Hoel, P. G. 12 :354 
Horst, P. 6:83 
Johnson, E., Jr. 11 :453 
Jordan, C. 3 :257 
Kimball, B. F. 11 :348 
Samuelson, P. A. 13 :424 
Stephan, F. F., and Deming, W. E. 11 :427 
Tukey, J. W.19:91 
Vajda, S. 16 :381 
Wisniewski, J. K. 8:145 
Wong, Y. K. 6:53 
Lexis Theory of Dispersion 
Hendricks, W. A. 6:78 
Likelihood Ratio (see Statistical Hypothe- 
ses, Testing of) 
Linear Simultaneous Equations (see Least 
Squares) 
Berry, C. E. 16 :398 
Lonseth, A. T. 13 :332 15 :323 
Reich, E. 20 :448 
Tuckerman, L. B. 12 :307 
Logarithmic Distribution Function (see Dis- 
tribution Functions) 
Markoff Chains 
Epstein, B. 20 :590 
Montroll, E. 18:18 
Matching Theory 
Anderson, T. W. 14 :426 
Battin, I. L. 18 :294 
Chapman, D. W. 6:85 
Greenwood, J. A. 9:56 


Greville, T. N. E. 12 :350 15 :432 
Kaplansky, I. and Riordan, J. 16 :272 
Kullback, 8. 10:77 

Levene, H. 20:91 

Riordan, J. and Kaplansky, I. 16:272 


Matrix Theory 


Bacon, H. M. 19 :422 

Bowker, A. H. 18:285 

Dwyer, P. S. 15 82 

Dwyer, P. S., and Waugh, F. V. 16:259 
Dwyer, P.S., and MacPhail, M. 8. 19:517 
Guttman, L. 17 :336 

Hotelling, H. 14:1 14:440 16:427 
MacPhail, M.8., and Dwyer, P. 8.19 517 
Samuelson, P. A. 13 :424 

Satterthwaite, F. E. 15 :373 

Ullman, J. 16 :205 

Waugh, F. V. 16 :216 

Waugh, F. V., and Dwyer, P. 8. 16 :259 


Maximum Likelihood, Method of (see also 


Statistical Hypotheses, testing of) 
Carlson, J. L. 3:86 
Daly, J. F., and Wilks, S. 8. 10 :225 
Myers, R. J. 5 320 
Wald, A. 20 :595 
Wilks, 8S. S., and Daly, J. F. 10 :225 
Wolfowitz, J. 20:601 


Mean Deviation 


Godwin, H. J. 20:127 
Singleton, R. R. 11 :301 
Tukey, J. W. 17:75 


Mean Square Successive Difference (see 


also Serial Correlation) 

Bellinson, H. R., and von Neumann, J., 
Kent, R. H., and Hart B. I. 12: 153 

Hart, B. I. 13 :445 

Hart, B. I., and von Neumann, J. 13 :207 

Hart, B. I., Bellinson, H. R., von Neu- 
mann, J., and Kent, R. H. 12 :153 

Hsu, P. L. 17 :350 

Kent, R. H., Bellinson, H. R., Hart, B. L., 
and von Neumann, J. 12:153 

von Neumann, J., Bellinson, H. R., Kent, 
R. H., Hart, B. I. 123153 

von Neumann, J. 12 :367 13 :86 

von Neumann, J., and Hart, B. I. 13 :207 

Williams, J. D. 12 :239 


Mean (see also Arithmetic Mean, Quemette 


and Sampling Theory of Means) 
Baker, G. A. 1:199 
Beckenbach, E. F. 13 :88 
Carver, H. C. 5:73 
Craig, A. T. 2:99  4:127 





24 INDEX OF PAPERS 


Mean (see also Arithmetic Mean, Geo- 
metric and Sampling Theory of 
Means)—Cont. 

Dodd, E. L. 8:12 9:153 11 :163 
12 :422 

Dwyer, P.S.11 :353 
Fréchet, M. 18 :290 
Gruzewska, H. M. 4:196 
Jenkins, T. N. 3:45 
Norris, N.6:27  8:118 

. Pollard, H. 8. 6 :227 
Welker, E. L.18:111 
Zoch, R. T. 6:171 

Measure of Random Sets 
Bronowski, J., and Neyman, J. 16 :330 
Neyman, J., and Bronowski, J. 16 :330 
Robbins, H. E.15:70 165:321 16 :342 

18 :297 
Santalo, L. A. 18:37 
Votaw, D. F., Jr. 17 :240 
Median 

Mood, A. M. 12 :268 
Pollard, H.S. 6 :227 
Thompson, W. R. 7 :122 
Walsh, J. E. 20 :64 

Moment-Generating Function (see also 

Chazacteristic Function) 

Curtiss, J. H. 13 :430 

Kozakiewicz, W. 18 :61 

Moments 

Alter, D. 10 :192 

Carver, H. C. 4:229 

Churchill, E. 17 :244 

Dodd, E. L. 9:153 

Dwyer, P. S. 8:21 9:1 9:97 9:288 
11 :353 

Evans, W. D. 11:106 

Feldman, H. M. 6:30 

Godwin, H. J. 20 :279 

Hastings, C., Jr., Mosteller, F., Tukey, 
J. W., Winsor, C. P.18:413 

Hsu, L. C.15:399 16:369 

Jones, H. L. 19:270 

Kendall, M. G. 11:402 12:464 

Kirkham, W. J. 6:96 

_Larguier, E. H. 7 :191 

Larson, H. D. 10:198 

Merrell, M. 4:216 

Mosteller, F., Tukey, J. W., Winsor, C. P., 
and Hastings, C., Jr. 18:413 

O’Toole, A. L. 4:1 

Riordan, J. 8:103 

Rodrigues, M. D. 16:74 

Stephan, 'F. F. 16 :50 


Tukey, J. W., Mosteller, F., Winsor, C. 
P., and Hastings, C., Jr. 18:413 

Tukey, J. W. 20 :523 

Vatnsdal, J. R. 17 :198 

Winsor, C. P., Mosteller, F., Tukey, J. 
W., and Hastings, C., Jr. 18:413 

Ziaud-Din, M. 9:63 


Multiple Correlation 


Bacon, H. M. 9:227 
Dwyer, P. 8. 8:224 
Fischer, C. H. 4:278 
Guttman, L. 9:305 
Horst, P. 3:40 
Hotelling, H. 11 :271 
Kirkham, W. J. 8:68 
Miner, J. R. 2:320 
Starkey, D. M. 10 :327 
Wherry, R. J. 2:440 
Wilks, 8.8.3 :196 


Multinomial Distribution (see also Distribu- 


tion Functions) 
Kullback, S. 8:127 
McCarthy, P. J. 18:349 


Multivariate Statistical Theory 


Anderson, T. W., and Girschick, M. A.15: 
345 

Anderson, T. W.17 :409 

Bartlett, M.S. 18:1 

Brookner, R. J., and Wald, A. 12 :137 

Daly, J. F.11:1 

Finney, D. J. 17 :344 

Girschick, M. A. 10 :203 

Girschick, M. A., and Anderson, T. W. 15: 
345 

Hoel, P. G. 8:149 

Hotelling, H. 2 :360 

Hsu, P. L. 9:231 

Kullback, 8S. 6 :202 

Lukomski, J. 10 :236 

Mauchly, J. W. 11 :204 

Nanda, D. N. 19:47 19 :340 

Rasch, G. 19 :262 

Reiers¢l, O. 11 :193 

Tintner, G. 16 :304 

Votaw, D. F., Jr. 19 :447 

Wald, A., and Brookner, R. J. 12 :137 

Wilks, S. 8S. 3:196 17:257 


Negative Binominal Distribution 


Olmstead, P. S. 11 :363 


Normal Bivariate Distribution 


Anderson, T. W., and Villars, D. S. 14:141 
Hsu, C. T. 11:410 12 :279 

Oberg, E. N. 18 :442 

Paulson, E. 13 :440 





INDEX OF PAPERS 


Villars, D. S., and Anderson, T. W. 

14:141 
Normal Distribution, Description and Prop- 

erties of (see also Distribution Func- 
tions) 

Dixon, W. J. 19:424 

Gordon, R. D. 12 :364 

Kaplansky, I. 14:197 

Lukacs, E. 13 :91 

Williams, J. D. 17 :363 

Non-Parametric Statistical Theory (see also 

Extreme Values, Order Statistics, 
Ranges, Rank Correlation, Ranking 
Theory, Run Theory, Matching Theory 
and Tolerance Limits) 

Dixon, W. J. 11:199 

Eisenhart, C., and Swed, F. 8. 14:66 

Friedman, M. 11 :86 

Hoeffding, W. 19 :546 

Hotelling, H. and Pabst, M. R. 7:29 

Kimball, B. F. 18 :540 

Kolmogorov, A. 12 :461 

Lehmann, E. L., amd Stein, C. 20:28 

Mann, H. B. 16:193 

Mann, H. B., and Whitney, D. R. 18:50 

Mathison, H. C. 14:188 

Mosteller, F. 19:58 


Scheffé, H., and Tukey, J. W. 16 :187 
Swed, F. S., and Eisenhart, C. 14:66 
Thompson, W. R. 7 :122 
Tukey, J. W., and Scheffé, H. 16 :187 
Tukey, J. W., Mosteller, F., Winsor, C. P., 
and Hastings, C., Jr. 18 :413 
Tukey, J. W. 18 :529 19:30 
Walsh, J. E.17:44 17:246 
Winsor, C. P., Mosteller, F., Tukey, J. 
W., and Hastings, C., Jr. 18:413 
Partial Correlation 
Fischer, C. H. 4:278 
Guttman, L. 9:305 
Strecker, G. 6 :143 
Path Coefficients 
Wright, 8. 5:161 
Pearson’s Curves 
Brown, A. W. 11 :448 
Hildebrandt, E. H. 2:379 
Kullback, 8. 7:51 
Salvosa, L. R.1:191 1 :274 
Periodogram Analysis 
Alter, D. 8:121 
Dodd, E. L. 1:205 
Powell, R. W. 1 :123 
Starkey, D. M. 10 :327 
Perpetual Calendar 
Roberts, J. L. 7:44 
Poisson Distribution 
Cochran, W. G. 11 :335 


10 :254 


Pabst, M. R., and Hotelling, H. 7:29 


Scheffé, H. 14:227 14 :305 
Scheffé, H. and Tukey, J. W. 16 :187 


Stein, C., and Lehmann, E. L., 20: 28 
Stewart, W. M. 12:236 
Swed, F.S., and Eisenhart, C. 14:66 


Herbach, L. H. 19:400 
Hoel, P. G. 16 :362 
Maceda, E. C. 19:414 


Tukey, J. W., and Scheffé, H. 16 :187 

Tukey, J.W.18:529 19:30 

Wald, A., and Wolfowitz, J. 10:105 11 :147 
14:378 15 :358 

Whitney, D. R., and Mann, H. B. 18:50 

Wolfowitz, J., and Wald, A. 10:105 11 :147 
14:378 15:358 

Orthogonal Polynomials 
Beale, F. 12 :97 
Order Statistics (see also Non-Parametric 

Statistical Theory) 

Eisenhart, C., and Swed, F. S. 14:66 

Epstein, B. 20 :590 

Godwin, H. J. 20:279 

Gumbel, E. J. 14:163 

Hastings, C., Jr., Mosteller, F., Tukey, 
J. W., and Winsor, C. P. 18:413 

Jones, H. L. 19:270 

Mosteller, F., Hastings, C., Jr., Tukey, J. 
W., and Winsor, C. P. 18:413 

Noether, G. E. 19 :416 


Riordan, J. 8:103 
Satterthwaite, F. E. 13 :410 
Weida, F. M. 6:102 
Polynomials (see also Orthogonal Poly- 
nomials) 
Barkey, I. H. 6:131 
Beale, F. 8 :206 
Davis, H. T. 4:155 
Hildebrandt, E. H. 2:379 
Kimball, B. F. 11 :348 
Samuelson, P. A. 12 :360 ; 
Probability Functions (see Distribution 
Functions) anne 
Probability Theory (see also Contagious 
Distributions, Distribution Functions, 
Inverse Probability, Markoff Chains, 
Measure of Random Sets, Random 
Walk Problem, Renewal Theory, 
Sampling Theory, Sequential Analysis, 
and Tchebycheff’s Inequality) 





26 


Probability Theory : Combinatorial (see also 
Sampling Theory for Finite Popula- 
tions) 

Chung, K. L. 12:328 13:338 14:63 
14 :123 14 :234 

Chung, K. L., and Hsu, L. C. 16:91 

Geiringer, H.9:260 10:202 

Hsu, L. C. 16 :369 

Hsu, L. C., and Chung, K. L. 16:91 

McCarthy, P. J.18:349 

Woodbury, M. A. 20:311 

Probability Theory: Limit Theorems and 
Limiting Distributions (see also Proba- 
bility Theory: Stochastic Processes, 
and Sampling Theory) 

Cramér, H. 18 :165 

Dantzig, G. B. 10 :247 

Doob, J. L.6:160 20:393 

Dvoretzky, A. 20 :296 

Erdés, P. 20 :286 

Feller, W. 16 :301 16:319 19:177 

Geiringer, H. 11 :393 

Halmos, P. R. 15 :182 

Hoeffding, W. 19 :293 

Kimball, B. F. 15 :423 

Madow, W. G. 11 :125 

Mann, H. B., and Wald, A. 14:217 

Robbins, H. E. 19:72 

Scheffé, H. 18 :434 

von Mises R. 18 :304 

Wald, A., and Mann, H. B.14:217 

Probability Theory: Stochastic Processes 
(see also Renewal Theory and Stochas- 
tic Equations) 

Doob, J. L. 15 :229 

Harris, T. E.19:474 

Kac, M., and Siegert, A. J. F. 18 :438 

Kendall, D. G. 19:1 

Otter, R. 20 :206 

Siegert, A. J. F., and Kac, M. 18:438 

Wald, A. 19:40 

Yosida, K. 20 :292 

Probability Theory: Miscellaneous 

Belz, M. H. 18 :604 

Birnbaum, Z. W. 13 :245 

Birnbaum, Z. W., and Zuckerman, H. S. 
15 :328 

Boas, R. P., Jr. 20 :376 

Camp, B. H. 8:90 

Chung, K. L. 19:88 

Copeland, A. H.3:143 

Curtiss, J. H. 12 :409 

Derkson, J. B. D. 10 :380 


INDEX OF PAPERS 


Doob, J. L. 12 :206 
Doob, J. L., and von Mises, R. 12 :215 
Epstein, B. 19:370 
Fréchet, M. 18 :288 
Girschick, M. A. 18 :235 13 :447 
Godwin, H. J. 20 :127 
Gurland, J. 19 :228 
Halmos, P. R., and Savage, L. J. 20 :225 
Hartman, P., and Wintner, A. 19:389 
Hsu, P. L. 16 :204 
Huntington, E. V.10:195 
Hurwitz, H., and Kac, M. 16 :173 
Kac, M. 16 :400 
Kac, M., and Hurwitz, H. 16 :173 
Kimball, B. F. 18 :540 
Kullback, S. 7:51 
Laderman, J., and Lowan, A. N. 10 :360 
Lawther, H. P., Jr. 4:241 
Lowan, A. N., and Laderman, J. 10 :360 
Lukomski, J. 10 :236 
Marks, E. S.19:419 
Pitman, E. J. G., and Robbins, H. 20 :325 
20 :552 
Richards, P. I. 19:16 
Rietz, H. L. 7 :145 
Robbins, H. E.19:266 19:360 
Robbins, H. E., and Pitman, E. J. G. 
20 :325 20 :552 
Sard, A. 20 :612 
Savage, L. J., and Halmos, P. R. 20 :225 
Ullman, J. 165 :205 
von Mises, R. 10:99 12 :191 
von Mises, R., and Doob, J. L. 12 :215 
Wald, A.9:244 16:330 
Wertheimer, A. 3 :64 
Wintner, A. 18 :589 
Wintner, A., and Hartman, P. 19:389 
Zoch, R. T. 8:177 
Zuckerman, H. S., and Birnbaum, Z. W 
15 :328 
Quality Control (see also Sampling Inspec- 
tion) 
Howell, J. M. 20 :305 
Mosteller, F. 12 :228 
Olmstead, P. S. 11 :363 
Shewhart, W. A. 10:80 
Wolfowitz, J. 14:280 
Random Numbers 
Horton, H. B. 19:81 
Horton, H. B., and Smith, R. T., III. 20 :82 
Smith, R. T., III, and Horton, H. B. 
20 :82 
Walsh, J. E. 20 :580 
Random Sets (see Measure of Random Sets) 





INDEX OF PAPERS 


Random Walk Problem 
Blackwell, D., and Girschick, M. A. 17: 
310 
David, H. T. 20 :603 
Girschick, M. A., and Blackwell, D. 17: 
310 
Kac, M. 16 :62 
Range 
Gumbel, E. J. 15 :414 
Hoel, P. G. 17 :475 
Walsh, J. E. 20 :257 
Rank Correlation 
Hotelling, H., and Pabst, M. R. 7:29 
Pabst, M. R., and Hotelling, H. 7 :29 
Woodbury, M. A. 11 :358 
Ranking Theory 
Friedman, M. 11 :86 
Guttman, L. 17 :144 
Kendall, M. G., and Smith, B. B. 10 :275 
Olds, E.G. 9:133  20:117 
Smith, B. B., and Fendall, M. G. 10:275 
Regression Theory 
Andersson, W. 5:81 
Bernstein, F. 8:77 
Bridger, C. A. 9:309 
Coleman, J. B. 3:79 
Dwyer, P.S. 8:224 
Eisenhart, C. 10 :162 
Ezekiel, M. 1 :275 
Finney, D. J.17 :344 
Kenney, J. F. 10:70 
Miner, J. R. 2 :320 
Quensel, C. E. 7 :196 
Tintner, G. 16 :304 
Villars, D. 8. 18 :596 
Wald, A. 11 :284 
Wicksell,S.D.1:3 
Wong, Y. K. 7:47 
Rejection of Observations 
Thompson, W. R. 6:214 
Renewal Theory (see also Probability 
Theory : Stochastic Processes) 
Brown, A. W. 11 :448 
Feller, W. 12 :243 
Kendall, D. G. 19:1 
Lotka, A. J. 10:1 
19:190 
Representative Sampling (see Stratified 
Sampling) 
Run Theory 
Eisenhart, C., and Swed, F.S. 14:66 
Kaplansky, I. 16 :200 
Kaplansky, I., and Riordan, J. 16 :272 


18 :384 


18 :586 


10 :144 13 :115 


Levene, H., and Wolfowitz, J. 15:58 
Mosteller, F. 12 :228 
Mood, A. M. 11 :367 
Olmstead, P. 8. 17 :24 
Riordan, J., and Kaplansky, I. 16 :272 
Swed, F.S., and Eisenhart, C. 14:66 
Wolfowitz, J., and Levene, H. 15:58 
Wolfowitz, J.14:280 15:97 15:163 
Sampling, Experimental 
Dunlap, H. F. 2:66 
Frankel, A., and Kullback, 8. 11 :209 
Kullback, S., and Frankel, A. 11 :209 
Robinson, S. 4 :285 
Sampling Inspection 
Bartky, W. 14 :363 
Curtiss, J. H. 17 :62 
Dodge, H. F. 14 :264 
Grubbs, F. E. 20 :242 
Littauer, S. B., and Peach, P. 17:81 
Mood, A. M. 14:415 
Neyman, J. 12:46 
Peach, P., and Littauer, S. B. 17:81 
Silber, J. 19 :246 
Wald, A., and Wolfowitz, J. 16:30 

Sampling Theory (see also Analysis of 
Variance, Analysis of Covariance, Chi- 
Square Test, Confidence Intervals, Con- 
tingency and Contingency Tables, Dis- 
tribution Functions, Extreme Values, 
Fisher’s z-Test, Multivariate Statisti- 
cal Theory, Non-Parametric Statistical 
Theory, Order Statistics, Stratified | 
Sampling, Sampling Inspection, Se- 
quential Analysis, Significance Tests, 
Snedecor’s F-Test, Statistical Hy- 
potheses, Student’s i-Test, and. Sys- 
tematic Sampling) 

Sampling Theory for Finite Populations (see 
also Probability Theory: Combina- 
torial) : 

Carver, H. C.1:101 1 :260 

Goodman, L. A. 20 :572 

Hansen, M.H., and Hurwitz, W. N. 14 :333 
20 :426 

Hurwitz, W. N., and Hansen, M. H. 14:333 
20 :426 

Madow, W. G.19:535 

Olds, E. G. 11 :355 

Sampling Theory of Means and Other 

Linear Functions 

Andrews, F. C., and Birnbaum, Z. W., 
20:458 

Baker, G. A. 1:199 
17366 =: 20123 


2:333 «=: :219 





28 INDEX OF PAPERS 


Birnbaum, Z. W., and Andrews, F. C. 


20:458 
Brown, G. W., and Tukey, J. W.17:1 
Carlson, J. L. 3:86 
Craig, A. T. 3:126 4:127 
Craig, C. C. 2:99 
Dodd, 8S. C. 7 :202 
Hsu, P. L. 16:1 
Tukey, J. W., and Brown, G. W. 17:1 
Welker, E. L. 18:111 
Wilks, S.S.3:163 


Sampling Theory of Variances, Covariances, 
and Other Quadratic Forms (see also 


Analysis of Variance) 
Baker, G. A. 2:333 6 :127 11 :219 


Craig, A. T. 3:126 9:48 14:195 


18 :565 
Grubbs, F. E. 16 :75 
Hsu, C. T. ~~ 410 12 :279 


11 :125 


ie, 8.. E. ., and Robbins, H. E. 20: 


552 
Robbins, H. E. 19 :266 


Robbins, H. E., and Pitman, E. J. G. 20: 


552 
Welch, B. L.18:118 
Wilks, S.S. 3 :163 
Sampling Theory: Miscellaneous 
Anderson, R. L. 13:1 
Baer, R. 16 :348 
Baker, G. A. 3:1 
Brown, G. M. 4 :288 
Camp, B. H. 9:221 17 :226 
Carver, H. C.1:101 2 :82 
Craig, C. C. 2 :324 
Dieulefait, C. E. 13 :94 
Dixon, W. J.16:119 
Dwyer, P.S. 8:21 9:1 9:97 
Epstein, B. 19 :370 
Ezekiel, M. 1 :275 
Feldman, H. M. 3:20 6 :30 
Goodman, L. A. 20 :572 


Greenwood, J., and Greville, T. N. E. 10: 


297 


Greville, T. N. E., and Greenwood, J. 10: 


297 
Hoel, P. G. 14:289 
Kac, M. 19 :257 
Kimball, B. F. 20:110 
Kullback, S. 5 :263 


Leipnik, R. B. 18:80 
Madow, W. G. 8:159 
Molina, E. C. 17 :325 
Mood, A. M. 12 :268 
Mosteller, F. 17 :377 
Norris, N.11 :445 
Plackett, R. L. 19:575 
Quenouille, M. H. 20 :355 20 :561 
Rider, P. R. 2:48 
Shen, C. L. 7:62 
Tukey, J. W. 20 :523 
Scaling and Scales of Measurements 
Cochran, W. G. 14 :205 
Guttman, L. 17 :144 
Semi-invariants 
Craig, C. C. 2:154 = 11:177 
Dressel, P. L. 11:33 
Sensitivity Testing 
Churchman, C. W., and Epstein, B. 15 :90 
Epstein, B., and Churchman, C. W. 16 :90 
Sequential Analysis 
Albert, G. E. 18:593  19:426 
Blackwell, D.17 :84 18 :105 
Blackwell, D., and Girshick, M. A. 18:277 
17 :310 
Blom, G. 20 :439 
Girschick, M. A.17:123 17:282 
Girschick, M. A., and Blackwell, D. 
17:310 18:277 
Harris, T. E. 18 :294 
Herbach, L. A. 19:400 
Noether, G. E. 20 :455 
Paulson, E. 18 :447 
Savage, L. J. 18:295 
Sobel, M., and Wald, A. 20 :502 
Stein, C. 17 :498 
Stein, C., and Wald, A. 18:427 
Wald, A. 16:283 16:117 16 :287 
17:466 17:493 
Wald, A., and Stein, C. 18 :427 
Wald, A., and Wolfowitz, J. 19:326 
Wald, A., and Sobel, M. 20 :502 
Wolfowitz, J. 17:489 18:131 18 :215 
Wolfowitz, J., and Wald, A. 19:326 
Serial Correlation (see also Mean Square 
Successive Difference) 
Anderson, R. L. 18:1 
Dixon, W. J. 16 :119 
Koopmans, T. 13 :14 
Leipnik, R. B. 18:80 
Madow, W. G. 16 :308 
Quenouille, M. H. 20:561 
Rubin, H. 16 :211 





INDEX OF PAPERS 29 


Wald, A., and Wolfowitz, J. 14:378 
Wolfowitz, J., and Wald, A. 14:378 
Sheppard’s Corrections 
Abernethy, J. R. 4:263 
Alter, D. 10:192 
Bartlett, M.S. 2:309 
Carver, H. C.7:154 
Craig, C. C. 7:55 
Dwyer, P. 8. 13 :138 
Kullback, S. 6 :158 
Lewis, W. T. 6:11 
Pierce, J. A. 11:311 


12 :339 


Significance Tests (see also Statistical Hy- 


potheses, Testing of) 

Anderson, T. W., and Villars, D. 8. 14:141 

Baker,G.A.6:197 12:233 

Bancroft, T. A. 15 :190 

Bowker, A. H. 15 98 

Brown, G. W. 10:119 

Daly, J. F. 17:71 

Dixon, W. J.11:199 

Ferris, C. D., Grubbs, F. E., and Weaver, 
C. L. 17:178 

Fertig, J. W., and Proehl, E. A. 8:193 

Friedman, M. 11 :86 

Grubbs, F. E., Ferris, C. D., and Weaver, 
C.L.17:178 

Hart, B. I. 12:153 13 :207 

Hoel, P.G.8:149 186:362 

Hotelling, H., and Pabst, M. R. 7:29 

Hsu, P. L. 16 :278 

Johnson, N. L. 11 :227 

Kendall, M. G., and Smith, B. B. 10:275 

Kent, R. H., and von Neumann, J. 12:153 

Mann, H. B. 16 :193 

Mann, H. B., and Whitney, D. R. 18:50 

Mauchly, J. W. 11 :204 

Olds, E. G. 20:117 

Olmstead, P. S., and Tukey, J. W. 18:495 

Pabst, M. R., and Hotelling, H. 7:29 

Peiser, A.M.14:56  20:128 

Proehl, E. A., and Fertig, J. W. 8:193 

Smirnov, N.19:279 

Smith, B. B., and Kendall, M. G. 10:275 

Starkey, D. M. 9:201 

Stewart, W.M.12 :236 

Tintner, G. 10 :139 

Tukey, J. W., and Olmstead, P. 8. 18 :495 

Villars, D. 8. 18 :596 

Villars, D.S., and Anderson, T. W. 14:141 

von Neumann, J., and Kent, R. H. 12 :153 

Wald, A., and Wolfowitz, J. 15:358 
19 :326 


13 3445 


Walsh, J. E. 17:44 
20:64  20:257 
Weaver, C. L., Ferris, C. D., and Grubbs, 
F. E.17:178 
Whitney, D. R., and Mann, H. B. 18:50 
Williams, J. D. 12 :239 
Wolfowitz, J., and Wald, A. 165:358 
19 :326 
Wolfowitz, J. 20:540 
Young, L. C. 12 :293 
Skewness 
Dwyer, P. 8. 12:104 
Garver, R.3 :358 
Hotelling, H., and Solomons, L. M. 3:141 
Solomons, L. M., and Hotelling, H. 3:141 
Wilkins, J. E., Jr. 15 :333 
Snedecor’s F-Test (see also Fisher’s z- 
Test) 
Aroian, L. A. 12 :429 
Scheffé, H. 13 :371 
Statistical Decision Functions 
Paulson, E. 20:95 
Wald, A. 18 :549 


17 :358 18:88 


20 :165 


Statistical Estimation (see also Confidence 
Limits, Curve-Fitting, and Method of 
Maximum Likelihood) 

Anderson, T. W., and Rubin, H. 20:46 


Bancroft, T. A. 16 :190 

Brown, G. W. 18:582 

Carlson, A. G. 17 :355 

Craig, A. T. 14:88 

Daly, J. F. 12 :459 

Dressel, P.L.11:33 12:84 

Girschick, M. A., Mosteller, F., and Sav- 
age, L. J.17:13 

Goodman, L. A. 20 :572 

Gordon, R. D.12:115 

Grubbs, F. E. 18:194 

Halmos, P. R. 17 :34 

Hotelling, H. 12 :20 

Johnson, E., Jr. 11 :453 

Kimball, B. F. 17 :299 

Kullback, S. 10 :388 

Landau, H. G. 16 :219 

Mann, H.B.16:85 17:87 

Morse, A. P., and Grubbs, F. E. 18 :194 

Mosteller, F., Girschick, M. A., and Sav- 
age, L. J.17:13 

Mosteller, F. 17 :377 

Myers, R. J. 5 :320 

O’Toole, A. L. 4:79 

Rubin, H., and Anderson, T. W. 20 :46 





30 INDEX OF PAPERS 


Statistical Estimation (see also Confi- 
dence Limits, Curve-Fitting, and 
Method of Maximnm Likelihood) 
—Cont. 

Savage, L. J., Girschick, M. A., and Mos- 
teller, F. 17:13 

Seth, G. R. 20:1 

Smith, J. H. 18:231 

Stephan, F. F. 13 :166 

Tukey, J. W. 20:309 

Wald, A. 10:299 

Wilks, 8.8. 3 :163 

Will, H. S. 7 :165 

Statistical Hypotheses, Testing of (see also 
Analysis of Variance, Analysis of Co- 
variance, Chi-Square Test, Contin- 
gency, Design of Experiments, Fisher’s 
z-Distribution, Matching Theory, Mean 
Square Successive Difference, Rank 
Correlation, Ranking Theory, Run 
Theory, Serial Correlation, Significance 
Tests, Student’s ¢-Test.) 

Berger, A., and Wald, A. 20:104 

Brookner, R. J. 16 :221 

Brown, G. W. 11 :254 

Chernoff, H. 20 :268 

Court, L. M. 16 :326 

Daly, J. F.11:1 

Dixon, W. J.11:199 

Fertig, J. W. 7:113 

Hoeffding, W. 19 :546 

Hoel, P. G. 18 :556 19 :66 

Hoel, P. G., and Peterson, R. P. 20 :433 

Hsu, C.T.11:410 12:279 

Johnson, N. L. 11 :227 

Lehmann, E.L.18:97 18:473 

Lehmann, E. L., and Stein, C. 19:495 
20 :28 

Lehmer, E. 165 :388 

Lengyel, B. A. 10 :365 

Mathison, H. C. 14:188 

Mood, A. M. 10:187 

Mosteller, F. 19:58 

Neyman, J. 9:69 

Paulson, E. 12 :301 

Peterson, R. P., and Hoel, P. G. 20 :433 

Scheffé, H. 13 :280 

Simon, H. 14:149 

Stein, C. 16 :243 

Stein, C., and Lehmann, E. L. 19:495 
20 :28 

von Mises, R. 14 :238 16 :68 

Wald, A.10:299 12:1 12:396 

Wald, A., and Wolfowitz, J. 11 :147 


19:40  19:220 


20 :114 


16 :117 


Wald, A., and Berger, A. 20 :104 
Wilks,S.S.6:190 9:60 
Wolfowitz, J., and Wald, A. 11 :147 
Wolfowitz, J. 13 :247 
Stieltjes Integrals 
Bartlett, M.S. 1:95 
Shohat, J.1:73 
Stochastic Equations 
Anderson, T. W., and Rubin, H. 20 :46 
Rubin, H., and Anderson, T. W. 20:46 
Stochastic Processes (see also Probability 
Theory) 
Wold, H. O. A. 19:558 
Stratified Sampling 
Anderson, P. H. 13 :42 
Cochran, W. G. 14:205 
Craig, A. T. 10:26 
Frankel, L. R., and Stock, J. S. 10 :288 
Hasel, A. A. 13 :179 
Stock, J. S., and Frankel, L. R. 10:288 
“Student’s” t-Test 
Chung, K. L. 17 :447 
Craig, C. C. 12 :224 
Daly, F. J. 17:71 
Dantzig, G. B. 11:186 
Goldberg, H., and Levine, H. 17 :216 
Hendricks, W. A. 7:210 
Hotelling, H. 2 :360 
Hsu, P. L. 9:231 
Laderman, J. i0 :376 
Levine, H., and Goldberg, H. 17 :216 
Rietz, H. L. 10 :265 
Robbins, H. E. 19 :406 
Treloar, A. E., and Wilder, M. A. 6 :324 
Walsh, J. E. 18:280 18:601 19:93 
Wilder, M., and Treloar, A. E. 5 :324 
Sufficient Statistics 
Halmos, P. R., and Savage, L. J. 20:225 
Kimball, B. F. 17 :299 
Savage, L. J.. and Halmos, P. R. 20 :225 
Tukey, J. W. 20 :309 
Welch, B. L. 10:58 
Survey Articles 
Camp, B. H. 13 :62 
Craig, C. C. 18 :74 
Cramér, H. 18 :165 
Symmetric Functions 
Dwyer, P.S.9:1 
O’Toole, A. L. 2 :102 
Systematic Sampling 
Cochran, W. G. 17 :164 
Madow, W. G., and Madow, L. 16:1 
Madow, L., and Madow, W. G. 16:1 
Madow, W. G. 20 :333 


9:97 
3 :56 





INDEX OF PAPERS 


Tchebycheff’s Inequality 

Birnbaum, Z. W., Raymond, J., and 
Zuckerman, H.S.18:70 

Camp, B. H. 19 :568 

Craig, C. C. 4:94 

Raymond, J., Zuckerman, H.S., and Birn- 
baum, Z. W. 18:70 

Smith, C. D. 10:190 

Zuckerman, H.S., Raymond, J., and Birn- 
baum, Z. W.18:70 

Teaching of Statistics 

Deming, W. E. 11 :470 

Hotelling, H. 11 :457 


Time Series 


Tintner, G. 10:139 

Wold, H. O. A. 19:558 

Tolerance Limits, Statistical 

Birnbaum, Z. W., and Zuckerman, H. S. 
20 :313 

Bowker, A. H., 17 :238 

Guttman, L. 19 :410 

Murphy, R. B.19:581 

Paulson, E. 14:90 

Robbins, H. E. 15 :214 

Scheffé, H., and Tukey, J. W. 15 :217 

Thompson, W. R. 9:281 

Tukey, J. W., and Scheffé, H. 15 :217 

Wald, A., and Wolfowitz, J. 17 :208 


Trend Analysis 


Wald, A. 13:389 

Wilks, 8. S. 12:91 

Wolfowitz, J. 17 :483 

Wolfowitz, J.,and Wald, A.17 :208 

Zuckerman, H. S., and Birnbaum, Z. W. 
20 :313 


14:45 
13 :400 


Transformation of Distribution Functions 


Baker, G. A. 1:334 
Curtiss, J. H. 14:107 
Frankel, L. R., and Hotelling, H. 9:87 
Hotelling, H., and Frankel, L. R. 9:87 
Olshen, C. A. 9:176 

Rietz, H. L. 2:38 

Riordan, J. 20 :417 

(see also 


§:113 


Regression 
Theory) 

Brennan, J. F., and Housner, G. W. 19 :380 

Housner, G. W., and Brennan, J. F. 19 :380 

Palmer, E. Z. 1:345 

Robb, R. A. 1 :352 

Toops, H. A. 5:21 


Truncated Distributions 


Andrews, F. C., and Birnbaum, Z. W. 
20 :458 

Birnbaum, Z. W., and Andrews, F. C. 
20 :458 

Keyfitz, N. 9:66 

Tukey, J. W. 20 :309 








oe. 
ee 
% | 


es 
‘3 


THE INSTITUTE OF MATHEMATICAL STATISTICS 


(Organized September 12, 1935) 


OFFICERS FOR 1950 

i J. L. Doos, University of Illinois, Urbana 
President-Elect: 
E P. 8. Dwysr, University of Michigan, Ann Arbor 
Se etary-T reasurer: 

C. H. Fiscumr, University of Michigan, Ann Arbor 
T. W. AnpErRson, Columbia University, New York 

‘ 

© The purpose of the Institute of Mathematical Statistics is to stimulate 


, h in the mathematical theory of statistics and to promote cooperation 
between the field of pure research and the fields of application. 


- Membership dues including subscription to the ANNALS OF MATHEMATICAL 
Sratistics are $7.00 per year within the Western Hemisphere and $5.00 per 
year elsewhere. Dues and inquiries regarding membership in the Institute 
should be sent to the Secretary-Treasurer of the Institute. 


MEETINGS OF THE INSTITUTE 


ANNUAL MEETING—CHICAGO, ILLINOIS—December 27-29, 1950. 

To be held in conjunction with the meeting of the American Statistical 
Association. Abstracts must be in the hands of Associate Secretary K. J. 
Arnold, North Hall, University of Wisconsin, Madison, Wisconsin, not 
later than November 15. 


SPRING MEETING—OAK RIDGE, TENNESSEE—March 15-17, 1951. 

To be held in conjunction with the Biometric Society. Abstracts must 
be in the hands of Associate Secretary K. J. Arnold, North Hall, University 
‘of Wisconsin, Madison, Wisconsin, not later than February 1. 





